choosehappy / HistoQC

HistoQC is an open-source quality control tool for digital pathology slides
BSD 3-Clause Clear License
263 stars 105 forks source link

Update Dockerfile for ray-ml #281

Closed nanli-emory closed 6 months ago

jacksonjacobs1 commented 7 months ago

I tried building and running the histoqc image with this docker file and it does not seem to install histoqc:

(base) ray@49ad5d0a932f:/data$ which python3
/home/ray/anaconda3/bin/python3
(base) ray@49ad5d0a932f:/data$ python3 -m histoqc --version
/home/ray/anaconda3/bin/python3: No module named histoqc
(base) ray@49ad5d0a932f:/data$ conda deactivate
ray@49ad5d0a932f:/data$ source /opt/HistoQC/venv/bin/activate
(venv) ray@49ad5d0a932f:/data$ python3 -m histoqc --version
/home/ray/anaconda3/bin/python3: No module named histoqc

The reason is because python is only installed in a conda environment:

(venv) ray@49ad5d0a932f:/data$ pip freeze
bash: /opt/HistoQC/venv/bin/pip: /opt/HistoQC/venv/bin/python: bad interpreter: No such file or directory
jacksonjacobs1 commented 6 months ago

Just got to test this out. It looks like this solution worked by chance: I was able to run python3 -m histoqc --help successfully, but there are two problems.

  1. The environment is set to conda's "base" environment, which doesn't include histoqc's dependencies.
(base) ray@39580a85d729:/data$ which python3
/home/ray/anaconda3/bin/python3
(base) ray@39580a85d729:/data$ python3
Python 3.8.18 (default, Sep 11 2023, 13:40:15) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shapely
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'shapely'
>>> 
  1. More importantly, venv is not initialized correctly. Looks like it doesn't include its own python or pip distributions.
    (base) ray@39580a85d729:/data$ conda deactivate
    ray@39580a85d729:/data$ source /opt/HistoQC/venv/bin/activate
    (venv) ray@39580a85d729:/data$ pip freeze
    bash: /opt/HistoQC/venv/bin/pip: /opt/HistoQC/venv/bin/python: bad interpreter: No such file or directory
    (venv) ray@39580a85d729:/data$ 
    (venv) ray@39580a85d729:/data$ python
    Python 3.8.18 (default, Sep 11 2023, 13:40:15) 
    [GCC 11.2.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import shapely
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ModuleNotFoundError: No module named 'shapely'
    >>> 
nanli-emory commented 6 months ago

Hi @jacksonjacobs1 I updated dockerfile and setup.cfg and setup.py. I put the config_v2.1.ini and .svs files in ./slides and tested on my local docker. pipline runs successfully.:

docker build -t ray-histoqc  -f ./Dockerfile .
docker run --name ray-histoqc-instance -it -v ./slides:/data ray-histoqc

Please review and check. Thanks.

jacksonjacobs1 commented 6 months ago

Cool! This works for me too.

  1. Quick question: what do we still need the venv for.... https://github.com/nanli-emory/HistoQC/blob/dc85fd0143b090e404a73e03ba23c3fb9ead1bbd/Dockerfile#L13 ....since we are installing HistoQC and it's dependencies again in the conda environment?

  2. IF we don't need to install histoqc in the venv, we also don't need a builder (the python3.8 base image). What do you think? https://github.com/nanli-emory/HistoQC/blob/dc85fd0143b090e404a73e03ba23c3fb9ead1bbd/Dockerfile#L6

  3. Let's keep CMD ["bash"] at the end for now. Once setup completes, the container will connect the user to an internal bash shell to manually run histoQC commands.

nanli-emory commented 6 months ago

Hi @jacksonjacobs1 , I updated the dockerfile and remove the first stage build part. Please check.