Closed marcverhagen closed 11 months ago
After some experimenting it looks like the following works reasonably well:
FROM ghcr.io/clamsproject/clams-python-opencv4:1.0.9
ARG CLAMS_APP_VERSION
ENV CLAMS_APP_VERSION ${CLAMS_APP_VERSION}
RUN apt-get update && apt-get install -y wget
RUN pip install --no-cache-dir torch==2.1.0
RUN pip install --no-cache-dir torchvision==0.16.0
WORKDIR /app
RUN wget https://download.pytorch.org/models/vgg16-397923af.pth
RUN mkdir /root/.cache/torch /root/.cache/torch/hub /root/.cache/torch/hub/checkpoints
RUN mv vgg16-397923af.pth /root/.cache/torch/hub/checkpoints
COPY . /app
CMD ["python3", "app.py", "--production"]
What is the advantage of pip-installing torch/torchvision directly instead of via requirements.txt? I'm working on #30 and it now additionally needs yaml handler as a dependency. I wonder whether we should keep two (identical) files of dependency specs (as dockerfile and req.txt) if there's a clear benefit of doing so.
Also, for backbone models, I think there must be a better way to download a pth
file based on the model choice of ours, instead of hard-coding vgg URL (or other better performing models) manually.
For model download, we can do something like this;
Given a model config yaml file as https://github.com/clamsproject/app-swt-detection/blob/9d4f229c61888e43eb58aeffa8cf1885e28082bb/modeling/models/20231026-164841.config.yml#L1
in backbones.py
;
...
if __name__ == "__main__":
import sys
# pass the model choice via CLI
model_map[sys.argv[1]]()
# initiating a `ExtractorModel` instance will also download the pth file and initiate the torchvision model. When this code terminates, downloaded model file should stay in the local cache dir
then in dockerfile
...
RUN python -m modeling.backbones $(grep "model_type" modeling/classifier-config.yaml | cut -d: -f2)
CMD ["python3", "app.py", "--production"]
Maybe a little big ugly with the grep inside the container file and because it requires the backbones file to cater just to the container file, but we can play with that a bit. Definitely better than
RUN wget https://download.pytorch.org/models/vgg16-397923af.pth
fixed via #48.
Because
Not sure how to replicate this, but on one of my machines I could not do a docker-build. It would fail during the pip-install with a "Connection reset by peer" message, typically while installing torch. This went away once I split the requirements file into three files, with the first installing torch, torchvision and torchmetrics. But that is hardly a satisfactory solution.
Maybe related is that on another of my three machine I also get "Connection reset by peer" when downloading "https://download.pytorch.org/models/vgg16-397923af.pth". That download should probably be made part of the image building process.
It is also not totally clear to me what the most efficient build would be. Now we use the clams-python-opencv4-torch2 base image, but with the current requirements all of torch and cuda will be reinstalled because that base is on torch==2.0.1., resulting in an 11.4GB image. I tried to not reinstall torch/cuda, which should be possible by using torchvision==0.15, but that failed with obscure messages.
This leads me to believe that I should use the clams-python-opencv4 base image instead.
Finally, and this should probably also be its own issue in clams-python, the torch images (and probably some others as well) are larger than needed because a pip cache is kept in /root/.cache/pip, which holds 2.6GB of data and the image could therefore be much smaller. Using the following does create a much smaller image.
Done when
No response
Additional context
No response