Closed GemmaTuron closed 10 months ago
Sure @GemmaTuron I'll start working on it once i resolve the previous model issues. Thanks.
Hello @GemmaTuron The workflow is failing before any changes made https://github.com/ersilia-os/eos2re5/actions
Hi @ZakiaYahya Please do not link the action only but explain what has failed and why. In this case, I would have written: This is a large model (XX GB, please check) that uses xxx models to predict xx properties (please, do some reading on the model you are working on)
Error: The operation was canceled.
Since the model has been in principle updated to S3, we should be able to download the checkpoints from S3 - I will check that
It might also be that the Git LFS quota is over and this is failing, or that the model is too large to remain on the Git Actions cache ?
In that case, our git lfs quota is not over. Please, since it is the second time you encounter this issue of model size, investigate more on the available memory for Git Actions. And check the S3 bucket to see if there are files missing.
Hello @GemmaTuron Sure, but first let me made PR on it. Just doing the final commits now. After merging i'll dig into the detail all the actions, and will let you know in detail why it's hapening and what we should do to tackle that. Thanks. Plus My internet is causing trouble today, that's why my progress is slow today. Sorry for that.
no problem let me know when the PR is ready for merging
Hello @GemmaTuron I have made PR on it, kindly check it. Thanks.
@ZakiaYahya the Docker build has failed, can you check please?
Hello @GemmaTuron Yes, i'm going through the error log.
Hi @ZakiaYahya I went through your changes and the failing build. It seems like a lot of stuff is happening here!
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate' or
CommandNotFoundError: Your shell has not been properly configured to use 'conda deactivate'`. Turns out, it is not straightforward to use Conda within Docker https://pythonspeed.com/articles/activate-conda-dockerfile/ (Please go through this link, it explains many of the issues being faced in this build)Finally, as for cleaning up the model, I don't think the files get-pip.py and pack.py belong in the repository either, they should be removed.
Hi @DhanshreeA Thanks for the details,
(1) I'm working on it plus i'll go through your mentioned link about conda-docker
issue and will let you know that is working.
(2) The get-pip.py
and pack.py
is already present in the forked file, that's why i didn't delete it but if it is not causing any problem to model, i'll remove it.
Thanks.
Hi @ZakiaYahya let's do this, we can see if the files are being referenced or imported anywhere. If that's not the case, they should be safe to remove. A longer but safer approach would be to just remove them and see if the model works through the repo_path flag. If it continues working, then those files aren't needed.
Also for the docker issue there are a couple of things we can try right off the bat:
Can you check if you're able to build an image from that Dockerfile locally? You'll just need to run docker build .
in the repo.
If you're not able to build an image, try removing 'conda activate' and 'conda deactivate' commands likely because conda is out of scope at that point in the build (in that layer).
Hello @DhanshreeA
(1) Oh okay , i didn't knew it before that we build docker in --repo-path as well, i'll try it
(2) Yes but the problem is that they are using python 2.7
that's why they are creating separate conda environment. If i remove that it will install the packages in python 3.7, right? Let me check that.
(3) Okay, i'll remove those files and then check the model whether it's working or not.
Thanks
Hi @DhanshreeA and @ZakiaYahya
This is a particularly complicated model because it requires Py2.7 - let us know about the latest changes Zakia and otherwise we will tryt o work out a solution together
Sorry @ZakiaYahya my bad for the confusion, the pack.py file is definitely required. The get-pip.py is unnecessary.
Hello @GemmaTuron @DhanshreeA Yes i've checked that by removing the said file but it gave me an error "python: can't open file 'pack.py': [Errno 2] No such file or directory" so it's the necessary file for the repo. Plus @GemmaTuron i've done some changes in dockerfile as disscussed with @DhanshreeA in today's 1:1 meeting, i'm checking the model with repo-path, once it's working in my system i'll open the PR
Hello @GemmaTuron @DhanshreeA The model is working fine with the changes in dockerfile, let;s check whether it's working in github actions too or not. I've open PR on it. Thanks.
Hi @ZakiaYahya @GemmaTuron @miquelduranfrigola
I think we will have to strategize how to deal with this particular model's dependencies. As it is now, particularly with commands such as:
sudo apt update
sudo apt install python2.7
sudo apt install gfortran-7
Since it uses OS level commands (apt package manager in this case), which are OS specific, this would not work on anything but Debian based distributions (eg Ubuntu). We should try installing Python 2.7 through conda if possible, or look to moving this model away from Python 2.7 entirely (which might require a lot more rework). What are your thoughts?
Hello @DhanshreeA Yes we can try, but first we need to find out why this model requires py-27, is there something that is not compatible with py-37. I've tried two solutions already but didn't worked. May be @GemmaTuron and @miquelduranfrigola guide us better in this regard. Thanks
Hi @ZakiaYahya and @DhanshreeA I agree the py2.7 is extremely annoying, I am more inclined to drop this model if we cannot make it work, what do you think @miquelduranfrigola ?
Hi @GemmaTuron @miquelduranfrigola @DhanshreeA I'm stuck at this model, @miquelduranfrigola may be you suggest a better solution how to resolve the problem. Just giving a brief overview of the problem i'm encountering in this model. Basically this model requires package installation in python-2.7 instead of python-3.7, so for this it uses following commands in Dockerfile
RUN conda activate eos2re5-py27
RUN pip install scikit-learn==0.17.1
RUN pip install scipy==1.2.3
RUN conda deactivate
It was working fine both locally and inside ersilia using --repo-path, but it fails at "Upload to DockerHub"
giving
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'
. So, we tried various workarounds like using Shell
command in Dockerfile to activate and Deactivate conda but it doesn't work. Then we tried this workaround replacing individual RUN commands into a single one i.e.
RUN conda activate eos2re5-py27 && pip install scikit-learn==0.17.1 && pip install scipy==1.2.3 && conda deactivate
and it gives this error ‘CondaService Error: Conda has no pid'
and fails at "Upload to DockerHub"
. Now, i tried a lot but didn't get any help from google. Kindly help me in this regard.
OK @ZakiaYahya -
I have tried quite a bit too and I am experiencing errors indeed.
In particular, when inside a docker container, it doesn't seem to resolve the conda environments (for example, for openbabel). I solved this by manually downloading the tar.gz files from anaconda.
In any case, I agree we should get rid of the sudo
commands as @DhanshreeA is suggesting. Let's for, now, forget about Docker and make a Dockerfile
that works without sudo
. Then we will tackle Docker.
@ZakiaYahya , do you think we can try this?
Hi @miquelduranfrigola
Right, i'll try to get rid of sudo
commands in Dockerfile first. I'll let you know.
Thanks.
Hi @GemmaTuron @miquelduranfrigola @DhanshreeA I'm stuck at this model, @miquelduranfrigola may be you suggest a better solution how to resolve the problem. Just giving a brief overview of the problem i'm encountering in this model. Basically this model requires package installation in python-2.7 instead of python-3.7, so for this it uses following commands in Dockerfile
RUN conda activate eos2re5-py27 RUN pip install scikit-learn==0.17.1 RUN pip install scipy==1.2.3 RUN conda deactivate
It was working fine both locally and inside ersilia using --repo-path, but it fails at
"Upload to DockerHub"
givingCommandNotFoundError: Your shell has not been properly configured to use 'conda activate'
. So, we tried various workarounds like usingShell
command in Dockerfile to activate and Deactivate conda but it doesn't work. Then we tried this workaround replacing individual RUN commands into a single one i.e.
RUN conda activate eos2re5-py27 && pip install scikit-learn==0.17.1 && pip install scipy==1.2.3 && conda deactivate
and it gives this error
‘CondaService Error: Conda has no pid'
and fails at"Upload to DockerHub"
. Now, i tried a lot but didn't get any help from google. Kindly help me in this regard.
The AttributeError: CondaEnvironmentService object has no pid
is indeed an interesting one, because it is coming from Ersilia, when autoservice.py tries to bring up a Conda Environment. https://github.com/search?q=repo%3Aersilia-os%2Fersilia%20CondaEnvironmentService&type=code @miquelduranfrigola any idea why that might be happening?
Hello @GemmaTuron I've open PR on it, Thanks
Thanks all.
@DhanshreeA and @ZakiaYahya - all is good now? Any feedback you need from me?
Hi @miquelduranfrigola
I've done changes as suggested by you in the dockerfile and opened PR on it, Waiting for the github actions to complete, The "Upload to DockerHub"
took a lot of time to complete, so i hope it will work, Let's see.
Thanks for asking.
Hello @GemmaTuron @miquelduranfrigola @DhanshreeA
The model is failed again at "upload to dockerHub"
but not due to conda-isse
but due to "The job running on runner GitHub Actions 2 has exceeded the maximum execution time of 360 minutes"
. @GemmaTuron can you try it again by running it manually. It is taking way too long in uploading image to dockerHub and it's mainly due to conda environment and packages installation using conda-forge i think.
Hey @ZakiaYahya I looked through the raw logs for this job and it seems like we're still running into the same issue with dependencies and Conda:
2023-07-06T11:07:45.5349855Z #9 13714.8 Checking that autoservice works: 4.532059669494629s
2023-07-06T11:07:45.5350144Z #9 13714.8 🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨
2023-07-06T11:07:45.5350242Z #9 13714.8
2023-07-06T11:07:45.5350366Z #9 13714.8 Error message:
2023-07-06T11:07:45.5350461Z #9 13714.8
2023-07-06T11:07:45.5350595Z #9 13714.8 Ersilia exception class:
2023-07-06T11:07:45.5350721Z #9 13714.8 EmptyOutputError
2023-07-06T11:07:45.5350814Z #9 13714.8
2023-07-06T11:07:45.5351021Z #9 13714.8 Detailed error:
2023-07-06T11:07:45.5351390Z #9 13714.8 Model API eos2re5:run did not produce an output/root/eos/repository/eos2re5/20230706110637_1B4CC2/eos2re5/artifacts/framework/run.sh: line 1: /etc/profile.d/conda.sh: No such file or directory
2023-07-06T11:07:45.5351558Z #9 13714.8
2023-07-06T11:07:45.5351938Z #9 13714.8 CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
2023-07-06T11:07:45.5352079Z #9 13714.8 To initialize your shell, run
2023-07-06T11:07:45.5352171Z #9 13714.8
2023-07-06T11:07:45.5352381Z #9 13714.8 $ conda init <SHELL_NAME>
2023-07-06T11:07:45.5352471Z #9 13714.8
2023-07-06T11:07:45.5352621Z #9 13714.8 Currently supported shells are:
2023-07-06T11:07:45.5352771Z #9 13714.8 - bash
2023-07-06T11:07:45.5352915Z #9 13714.8 - fish
2023-07-06T11:07:45.5353053Z #9 13714.8 - tcsh
2023-07-06T11:07:45.5353195Z #9 13714.8 - xonsh
2023-07-06T11:07:45.5353332Z #9 13714.8 - zsh
2023-07-06T11:07:45.5353474Z #9 13714.8 - powershell
2023-07-06T11:07:45.5353569Z #9 13714.8
2023-07-06T11:07:45.5353856Z #9 13714.8 See 'conda init --help' for more information and options.
2023-07-06T11:07:45.5353948Z #9 13714.8
2023-07-06T11:07:45.5354469Z #9 13714.8 IMPORTANT: You may need to close and restart your shell after running 'conda init'.
2023-07-06T11:07:45.5354565Z #9 13714.8
2023-07-06T11:07:45.5354659Z #9 13714.8
2023-07-06T11:07:45.5354817Z #9 13714.8 Traceback (most recent call last):
2023-07-06T11:07:45.5355098Z #9 13714.8 File "/root/eos/repository/eos2re5/20230706110637_1B4CC2/eos2re5/artifacts/framework/code/main.py", line 14, in <module>
2023-07-06T11:07:45.5355270Z #9 13714.8 from sklearn.externals import joblib
2023-07-06T11:07:45.5355546Z #9 13714.8 ModuleNotFoundError: No module named 'sklearn'
2023-07-06T11:07:45.5355640Z #9 13714.8
2023-07-06T11:07:45.5356030Z #9 13714.8 CommandNotFoundError: Your shell has not been properly configured to use 'conda deactivate'.
2023-07-06T11:07:45.5356177Z #9 13714.8 To initialize your shell, run
2023-07-06T11:07:45.5356275Z #9 13714.8
2023-07-06T11:07:45.5356417Z #9 13714.8 $ conda init <SHELL_NAME>
2023-07-06T11:07:45.5356513Z #9 13714.8
2023-07-06T11:07:45.5356806Z #9 13714.8 Currently supported shells are:
2023-07-06T11:07:45.5356969Z #9 13714.8 - bash
2023-07-06T11:07:45.5357111Z #9 13714.8 - fish
2023-07-06T11:07:45.5357253Z #9 13714.8 - tcsh
2023-07-06T11:07:45.5357399Z #9 13714.8 - xonsh
2023-07-06T11:07:45.5357540Z #9 13714.8 - zsh
2023-07-06T11:07:45.5357705Z #9 13714.8 - powershell
2023-07-06T11:07:45.5357782Z #9 13714.8
2023-07-06T11:07:45.5358330Z #9 13714.8 See 'conda init --help' for more information and options.
2023-07-06T11:07:45.5358435Z #9 13714.8
2023-07-06T11:07:45.5358857Z #9 13714.8 IMPORTANT: You may need to close and restart your shell after running 'conda init'.
I would share the raw logs here but the URL would expire soon it would appear. To inspect them yourself, click on the cog-wheel icon next to the job:
@miquelduranfrigola @ZakiaYahya if this image built locally, maybe we can manually push it to DockerHub for now? What do you think? Hopefully this is the only model with Python 2.7 and gfortran dependencies?
Hi,
I think we need to make a decision on this model, since it is being so troubling... should we restrict Ersilia to py3 models?
Hello @GemmaTuron Yes, the whole problem is causing by conda, because we need conda here mainly for making py2.7 env apart from installing packages, we can somehow discard conda-forge channel and use pip but we can't simply discard py2.7 env.
Hello @ZakiaYahya - following up on yesterday's discussion, just as an FYI, I am trying to debug the model as follows.
Instead of relying on GitHub Actions, which is slow, I try to reproduce the procedure from docker. So I do the following:
Build a docker image of Ersilia
cd ersilia/dockerfiles/installer
docker build -t ersilia .
Run the image interactively
docker run -it --name ersilia_docker --entrypoint /bin/bash ersilia
Inside docker, I just clone the model information (slow)
git-lfs install
git clone https://github.com/ersilia-os/eos2re5
Then I fetch from the path:
ersilia -v fetch eos2re5 --repo_path eos2re5
I hope this makes sense.
Hey @miquelduranfrigola sorry for hijacking this again, when you say you're not relying on GH actions, do you mean you're running the above steps manually on your local machine?
I am running them on codespaces - which is probably the closer we can get to github actions
Update
We have managed to make it work on Codespaces:
13:06:46 | DEBUG | Latest meta: {'outcome': ['smiles', 'BBB_label', 'BBB_prob', 'CYP1A2-inhibitor_label', 'CYP1A2-inhibitor_prob', 'CYP1A2-substrate_label', 'CYP1A2-substrate_prob', 'CYP2C9-inhibitor_label', 'CYP2C9-inhibitor_prob', 'CYP2C9-substrate_label', 'CYP2C9-substrate_prob', 'CYP2C19-inhibitor_label', 'CYP2C19-inhibitor_prob', 'CYP2C19-substrate_label', 'CYP2C19-substrate_prob', 'CYP2D6-inhibitor_label', 'CYP2D6-inhibitor_prob', 'CYP2D6-substrate_label', 'CYP2D6-substrate_prob', 'CYP3A4-inhibitor_label', 'CYP3A4-inhibitor_prob', 'CYP3A4-substrate_label', 'CYP3A4-substrate_prob', 'Pgp-inhibitor_label', 'Pgp-inhibitor_prob', 'Pgp-substrate_label', 'Pgp-substrate_prob', 'F-30_label', 'F-30_prob', 'HIA_label', 'HIA_prob', 'F-20_label', 'F-20_prob', 'SkinSen_label', 'SkinSen_prob', 'AMES_label', 'AMES_prob', 'PPB_pred', 'VD_pred', 'CL_pred', 'T-half_pred', 'hERG_label', 'hERG_prob', 'HHT_label', 'HHT_prob', 'LD50_pred', 'Papp_pred', 'logD_pred', 'logS_pred']}
13:06:46 | DEBUG | outcome : {'type': 'mixed_array', 'shape': (49,)}
13:06:46 | DEBUG | Meta k: ['smiles', 'BBB_label', 'BBB_prob', 'CYP1A2-inhibitor_label', 'CYP1A2-inhibitor_prob', 'CYP1A2-substrate_label', 'CYP1A2-substrate_prob', 'CYP2C9-inhibitor_label', 'CYP2C9-inhibitor_prob', 'CYP2C9-substrate_label', 'CYP2C9-substrate_prob', 'CYP2C19-inhibitor_label', 'CYP2C19-inhibitor_prob', 'CYP2C19-substrate_label', 'CYP2C19-substrate_prob', 'CYP2D6-inhibitor_label', 'CYP2D6-inhibitor_prob', 'CYP2D6-substrate_label', 'CYP2D6-substrate_prob', 'CYP3A4-inhibitor_label', 'CYP3A4-inhibitor_prob', 'CYP3A4-substrate_label', 'CYP3A4-substrate_prob', 'Pgp-inhibitor_label', 'Pgp-inhibitor_prob', 'Pgp-substrate_label', 'Pgp-substrate_prob', 'F-30_label', 'F-30_prob', 'HIA_label', 'HIA_prob', 'F-20_label', 'F-20_prob', 'SkinSen_label', 'SkinSen_prob', 'AMES_label', 'AMES_prob', 'PPB_pred', 'VD_pred', 'CL_pred', 'T-half_pred', 'hERG_label', 'hERG_prob', 'HHT_label', 'HHT_prob', 'LD50_pred', 'Papp_pred', 'logD_pred', 'logS_pred']
13:06:46 | DEBUG | Schema: {'input': {'key': {'type': 'string'}, 'input': {'type': 'string'}, 'text': {'type': 'string'}}, 'output': {'outcome': {'type': 'mixed_array', 'shape': (49,), 'meta': ['smiles', 'BBB_label', 'BBB_prob', 'CYP1A2-inhibitor_label', 'CYP1A2-inhibitor_prob', 'CYP1A2-substrate_label', 'CYP1A2-substrate_prob', 'CYP2C9-inhibitor_label', 'CYP2C9-inhibitor_prob', 'CYP2C9-substrate_label', 'CYP2C9-substrate_prob', 'CYP2C19-inhibitor_label', 'CYP2C19-inhibitor_prob', 'CYP2C19-substrate_label', 'CYP2C19-substrate_prob', 'CYP2D6-inhibitor_label', 'CYP2D6-inhibitor_prob', 'CYP2D6-substrate_label', 'CYP2D6-substrate_prob', 'CYP3A4-inhibitor_label', 'CYP3A4-inhibitor_prob', 'CYP3A4-substrate_label', 'CYP3A4-substrate_prob', 'Pgp-inhibitor_label', 'Pgp-inhibitor_prob', 'Pgp-substrate_label', 'Pgp-substrate_prob', 'F-30_label', 'F-30_prob', 'HIA_label', 'HIA_prob', 'F-20_label', 'F-20_prob', 'SkinSen_label', 'SkinSen_prob', 'AMES_label', 'AMES_prob', 'PPB_pred', 'VD_pred', 'CL_pred', 'T-half_pred', 'hERG_label', 'hERG_prob', 'HHT_label', 'HHT_prob', 'LD50_pred', 'Papp_pred', 'logD_pred', 'logS_pred']}}}
13:06:46 | DEBUG | {'input': {'key': {'type': 'string'}, 'input': {'type': 'string'}, 'text': {'type': 'string'}}, 'output': {'outcome': {'type': 'mixed_array', 'shape': (49,), 'meta': ['smiles', 'BBB_label', 'BBB_prob', 'CYP1A2-inhibitor_label', 'CYP1A2-inhibitor_prob', 'CYP1A2-substrate_label', 'CYP1A2-substrate_prob', 'CYP2C9-inhibitor_label', 'CYP2C9-inhibitor_prob', 'CYP2C9-substrate_label', 'CYP2C9-substrate_prob', 'CYP2C19-inhibitor_label', 'CYP2C19-inhibitor_prob', 'CYP2C19-substrate_label', 'CYP2C19-substrate_prob', 'CYP2D6-inhibitor_label', 'CYP2D6-inhibitor_prob', 'CYP2D6-substrate_label', 'CYP2D6-substrate_prob', 'CYP3A4-inhibitor_label', 'CYP3A4-inhibitor_prob', 'CYP3A4-substrate_label', 'CYP3A4-substrate_prob', 'Pgp-inhibitor_label', 'Pgp-inhibitor_prob', 'Pgp-substrate_label', 'Pgp-substrate_prob', 'F-30_label', 'F-30_prob', 'HIA_label', 'HIA_prob', 'F-20_label', 'F-20_prob', 'SkinSen_label', 'SkinSen_prob', 'AMES_label', 'AMES_prob', 'PPB_pred', 'VD_pred', 'CL_pred', 'T-half_pred', 'hERG_label', 'hERG_prob', 'HHT_label', 'HHT_prob', 'LD50_pred', 'Papp_pred', 'logD_pred', 'logS_pred']}}}
13:06:46 | DEBUG | API schema saved at /root/eos/dest/eos2re5/api_schema.json
13:06:55 | DEBUG | Fetching eos2re5 done in time: 0:11:41.832527s
13:06:55 | INFO | Fetching eos2re5 done successfully: 0:11:41.832527
👍 Model eos2re5 fetched successfully!
There was quite a lot of docker specificity here:
source $CONDA_PREFIX_1/etc/profile.d/conda.sh
CONDA_PATH=$(dirname $(dirname $(which conda)))
PYTHON_ENV_PATH="${CONDA_PATH}/envs/eos2re5-py27/bin/python"
$PYTHON_ENV_PATH $1/code/main.py $2 $3
RUN CONDA_PATH=$(dirname $(dirname $(which conda))) && PYTHON_ENV_PATH="${CONDA_PATH}/envs/eos2re5-py27/bin/python" && $PYTHON_ENV_PATH -m pip install scikit-learn==0.17.1 && $PYTHON_ENV_PATH -m pip install scipy==1.2.3
RUN sudo apt install libfontconfig1 libxrender1 -y
Please check the current version of the model, for example from this commit: 929cfbd1f391c3272c156627ff8f424d2b0cac85
Actions are still running. Fingers crossed.
OK, I think it all worked:
https://github.com/ersilia-os/eos2re5/actions/runs/5520434861/jobs/10067049437
Yes @miquelduranfrigola , i think it will work, i hope so :)
Hello @GemmaTuron @miquelduranfrigola @DhanshreeA The model is uploaded to DockerHub at last after @miquelduranfrigola did a lot of fantastic work, the details are here https://github.com/ersilia-os/eos2re5/issues/1#issuecomment-1630813026. I think after testing this model we can close the issue. Thanks.
@GemmaTuron I'm moving this issue back to In Progress (even though it mostly works), there's a slight issue with how the output is serialized (ref comment here: https://github.com/ersilia-os/eos2re5/issues/4#issuecomment-1632869397), and @ZakiaYahya is quickly looking into this.
@ZakiaYahya let's aim to close it within this week?
Hi @DhanshreeA @GemmaTuron Yes, i'm already working on it. I've explain the issue in detail here. Please have a look. Thanks
Thanks @ZakiaYahya and @DhanshreeA for the discussion
Hello @GemmaTuron @DhanshreeA I've Open the PR again. Thanks
Hi @GemmaTuron - feel free to close this issue.
Please check that the model is working and refactor it model to the latest eos-template structure. The workflows have already been updated, you can start by checking if the Actions have run successfully or changes need to be made