Closed GemmaTuron closed 1 year ago
Hi @HellenNamulinda
I see the docker building has failed, can you please check?
Hi @HellenNamulinda
I see the docker building has failed, can you please check?
Hello @GemmaTuron The docker builds for arm/64 was failing because of pytorch. from line 8963 in log,
#9 5242.2 File "/root/eos/repository/eos9yui/20230613074434_A4AA14/eos9yui/artifacts/framework/neural_npfp/neural_npfp/utils.py", line 3, in <module>
#9 5242.2 import torch
#9 5242.2 ModuleNotFoundError: No module named 'torch'
This model was using pytorch 1.7,RUN conda install -c pytorch pytorch=1.7.0
. The previous model I updated(eos7a45) had pytorch 1.8 CPU only, and the docker builds were successful.
I updated torch for this model to pytorch 1.8
I hope with this version, the docker builds for arm/64 will be successful.
I tested the new pytorch version and it works for this model.
I created a pull request here. I will monitor the workflow after merging.
Hi @HellenNamulinda
I see the docker building has failed, can you please check?
Hi @GemmaTuron, The build has succeeded for amd/64, as seen at 6455
#9 383.4 👍 Model eos9yui fetched successfully!
#9 DONE 384.9s
However, the one for arm/64 has failed again. @miquelduranfrigola, the arm/64 build for this model isn't only failing because of torch. But also, most packages show that they are not available. I will look into these so that I put the commands that work for arm/64. For example at 6560
#8 5971.5 PackagesNotFoundError: The following packages are not available from current channels:
#8 5971.5
#8 5971.5 - scipy=1.5.2
And again this? at 8584
88%|████████▊ | 7/8 [1:41:02<13:36, 816.85s/it]17:36:44 | DEBUG | Initializing model for inferring its structure
#8 6079.1 17:36:44 | WARNING | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
#8 6079.1 17:36:44 | ERROR | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
Finally, one thing I have failed to understand is why the previous versions are being installed. Like where is this job getting the code from? for example pytorch 1.7 instead of 1.8, at 6484
#8 506.4 16:03:51 | DEBUG | Run commandlines on eos9yui
#8 506.4 16:03:51 | DEBUG | conda install -c rdkit rdkit=2020.09 -y
#8 506.4 conda install -c pytorch pytorch=1.7.0 -y
#8 506.4 conda install scipy=1.5.2 -y
#8 506.4 conda install seaborn=0.11.0=py_0 -y
#8 506.4 python -m pip --disable-pip-version-check install tqdm
#8 506.4 python -m pip --disable-pip-version-check install pyyml
#8 506.4 python -m pip --disable-pip-version-check install scikit-learn==0.23.2
#8 506.4 python -m pip --disable-pip-version-check install git+https://github.com/ersilia-os/bentoml-ersilia.git
Hi @HellenNamulinda !
That is interesting because I think I saw this on model eos74bo
from @emmakodes (Emma, please check and let us know if that is the case)
The AMD64 build is using this old versions as well?
Hi @HellenNamulinda !
That is interesting because I think I saw this on model
eos74bo
from @emmakodes (Emma, please check and let us know if that is the case) The AMD64 build is using this old versions as well?
Hello @GemmaTuron
Yes, even the amd/64 is using the previous versions; at 256
#9 11.31 15:55:39 | DEBUG | Run commands: ['conda install -c rdkit rdkit=2020.09 -y', 'conda install -c pytorch pytorch=1.7.0 -y', 'conda install scipy=1.5.2 -y', 'conda install seaborn=0.11.0=py_0 -y', 'pip install tqdm', 'pip install pyyml', 'pip install scikit-learn==0.23.2']
Hi @HellenNamulinda !
That is interesting because I think I saw this on model
eos74bo
from @emmakodes (Emma, please check and let us know if that is the case) The AMD64 build is using this old versions as well?
Yes @GemmaTuron it's same as what is happening in my model, the workflow is making use of the previous commit to build and upload model to DockerHub which is failing
@miquelduranfrigola have you seen this -- the Docker run is using the older version of the dockerfile, maybe because it starts to run before the commit is actually merged? we should check the order of actions
@HellenNamulinda I've manually re run the action, where docker build should we using the new dockerfile, but it seems to fail still, can you check?
Hello @GemmaTuron, I've gone through all dependencies which failed, and they are all conda installs. I believe the conda package repository might have better support for the AMD64 architecture compared to ARM64.
I have tried to analyze the build which succeeded for eos7a45 since it also required installing pytorch, but I realized that installing pytorch using conda failed,
#8 11995.3 PackagesNotFoundError: The following packages are not available from current channels:
#8 11995.3
#8 11995.3 - torchvision==0.9.0
#8 11995.3 - pytorch==1.8.0
#8 11995.3 - torchaudio==0.8.0
#8 11995.3
#8 11995.3 Current channels:
#8 11995.3
#8 11995.3 - https://conda.anaconda.org/pytorch/linux-aarch64
#8 11995.3 - https://conda.anaconda.org/pytorch/noarch
#8 11995.3 - https://repo.anaconda.com/pkgs/main/linux-aarch64
#8 11995.3 - https://repo.anaconda.com/pkgs/main/noarch
#8 11995.3 - https://repo.anaconda.com/pkgs/r/linux-aarch64
#8 11995.3 - https://repo.anaconda.com/pkgs/r/noarch
However, the build was successful because there was another requirement(torch_geometrics) which installed torch using pip. I'm making a few changes and will push again.
Hi @GemmaTuron, I have just fetched the model on Colab, which is now working. However, the build for this arm64 has still failed. I hope you will try to re-run with the new commit since it used the previous one.
Hello @HellenNamulinda and @GemmaTuron.
Thanks for putting so much effort into this. I think the current workflows are now correct - please look into them carefully to appreciate the changes and updates.
I am now re-running the workflows for this model, let's see if it works. If it does, feel free to close the issue.
@GemmaTuron - This model seems to be resolved. Please check
@GemmaTuron - This model seems to be resolved. Please check
Yes, @miquelduranfrigola and @GemmaTuron, The build was successful for both amd and arm, as seen in the metadata at 22205 amd64; at 19812
#9 205.8 👍 Model eos9yui fetched successfully!
#9 DONE 206.3s
and for arm64 at 22174
#8 8066.3 👍 Model eos9yui fetched successfully!
#8 DONE 8067.5s
Hello @GemmaTuron, I updated this model. I tested it locally and it works well. This is the log file for fetching the model, eos9yui_fetch.log
Prediction on a single molecule; eos9yui-predict-one.log And the predictions on the eml dataset; eml_eos9yui.csv, and its log eos9yui-predict-file.log
I pushed the changes and created a pull request here