Clean UP & Dockerization of eos9yui

HellenNamulinda commented 1 year ago

Hello @GemmaTuron, I updated this model. I tested it locally and it works well. This is the log file for fetching the model, eos9yui_fetch.log

Prediction on a single molecule; eos9yui-predict-one.log And the predictions on the eml dataset; eml_eos9yui.csv, and its log eos9yui-predict-file.log

I pushed the changes and created a pull request here

GemmaTuron commented 1 year ago

Hi @HellenNamulinda

I see the docker building has failed, can you please check?

HellenNamulinda commented 1 year ago

Hi @HellenNamulinda

I see the docker building has failed, can you please check?

Hello @GemmaTuron The docker builds for arm/64 was failing because of pytorch. from line 8963 in log,

#9 5242.2   File "/root/eos/repository/eos9yui/20230613074434_A4AA14/eos9yui/artifacts/framework/neural_npfp/neural_npfp/utils.py", line 3, in <module>
#9 5242.2     import torch
#9 5242.2 ModuleNotFoundError: No module named 'torch'

This model was using pytorch 1.7,RUN conda install -c pytorch pytorch=1.7.0. The previous model I updated(eos7a45) had pytorch 1.8 CPU only, and the docker builds were successful. I updated torch for this model to pytorch 1.8 I hope with this version, the docker builds for arm/64 will be successful. I tested the new pytorch version and it works for this model.

I created a pull request here. I will monitor the workflow after merging.

HellenNamulinda commented 1 year ago

Hi @HellenNamulinda

I see the docker building has failed, can you please check?

Hi @GemmaTuron, The build has succeeded for amd/64, as seen at 6455

#9 383.4 👍 Model eos9yui fetched successfully!
#9 DONE 384.9s

However, the one for arm/64 has failed again. @miquelduranfrigola, the arm/64 build for this model isn't only failing because of torch. But also, most packages show that they are not available. I will look into these so that I put the commands that work for arm/64. For example at 6560

#8 5971.5 PackagesNotFoundError: The following packages are not available from current channels:
#8 5971.5 
#8 5971.5   - scipy=1.5.2

And again this? at 8584

 88%|████████▊ | 7/8 [1:41:02<13:36, 816.85s/it]17:36:44 | DEBUG    | Initializing model for inferring its structure
#8 6079.1 17:36:44 | WARNING  | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
#8 6079.1 17:36:44 | ERROR    | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.

Finally, one thing I have failed to understand is why the previous versions are being installed. Like where is this job getting the code from? for example pytorch 1.7 instead of 1.8, at 6484

#8 506.4 16:03:51 | DEBUG    | Run commandlines on eos9yui
#8 506.4 16:03:51 | DEBUG    | conda install -c rdkit rdkit=2020.09 -y
#8 506.4 conda install -c pytorch pytorch=1.7.0 -y
#8 506.4 conda install scipy=1.5.2 -y
#8 506.4 conda install seaborn=0.11.0=py_0 -y
#8 506.4 python -m pip --disable-pip-version-check install tqdm
#8 506.4 python -m pip --disable-pip-version-check install pyyml
#8 506.4 python -m pip --disable-pip-version-check install scikit-learn==0.23.2
#8 506.4 python -m pip --disable-pip-version-check install git+https://github.com/ersilia-os/bentoml-ersilia.git

GemmaTuron commented 1 year ago

Hi @HellenNamulinda !

That is interesting because I think I saw this on model eos74bo from @emmakodes (Emma, please check and let us know if that is the case) The AMD64 build is using this old versions as well?

HellenNamulinda commented 1 year ago

Hi @HellenNamulinda !

That is interesting because I think I saw this on model eos74bo from @emmakodes (Emma, please check and let us know if that is the case) The AMD64 build is using this old versions as well?

Hello @GemmaTuron Yes, even the amd/64 is using the previous versions; at 256 #9 11.31 15:55:39 | DEBUG | Run commands: ['conda install -c rdkit rdkit=2020.09 -y', 'conda install -c pytorch pytorch=1.7.0 -y', 'conda install scipy=1.5.2 -y', 'conda install seaborn=0.11.0=py_0 -y', 'pip install tqdm', 'pip install pyyml', 'pip install scikit-learn==0.23.2']

emmakodes commented 1 year ago

Hi @HellenNamulinda !

That is interesting because I think I saw this on model eos74bo from @emmakodes (Emma, please check and let us know if that is the case) The AMD64 build is using this old versions as well?

Yes @GemmaTuron it's same as what is happening in my model, the workflow is making use of the previous commit to build and upload model to DockerHub which is failing

GemmaTuron commented 1 year ago

@miquelduranfrigola have you seen this -- the Docker run is using the older version of the dockerfile, maybe because it starts to run before the commit is actually merged? we should check the order of actions

GemmaTuron commented 1 year ago

@HellenNamulinda I've manually re run the action, where docker build should we using the new dockerfile, but it seems to fail still, can you check?

HellenNamulinda commented 1 year ago

Hello @GemmaTuron, I've gone through all dependencies which failed, and they are all conda installs. I believe the conda package repository might have better support for the AMD64 architecture compared to ARM64.

I have tried to analyze the build which succeeded for eos7a45 since it also required installing pytorch, but I realized that installing pytorch using conda failed,

#8 11995.3 PackagesNotFoundError: The following packages are not available from current channels:
#8 11995.3 
#8 11995.3   - torchvision==0.9.0
#8 11995.3   - pytorch==1.8.0
#8 11995.3   - torchaudio==0.8.0
#8 11995.3 
#8 11995.3 Current channels:
#8 11995.3 
#8 11995.3   - https://conda.anaconda.org/pytorch/linux-aarch64
#8 11995.3   - https://conda.anaconda.org/pytorch/noarch
#8 11995.3   - https://repo.anaconda.com/pkgs/main/linux-aarch64
#8 11995.3   - https://repo.anaconda.com/pkgs/main/noarch
#8 11995.3   - https://repo.anaconda.com/pkgs/r/linux-aarch64
#8 11995.3   - https://repo.anaconda.com/pkgs/r/noarch

However, the build was successful because there was another requirement(torch_geometrics) which installed torch using pip. I'm making a few changes and will push again.

HellenNamulinda commented 1 year ago

Hi @GemmaTuron, I have just fetched the model on Colab, which is now working. However, the build for this arm64 has still failed. I hope you will try to re-run with the new commit since it used the previous one.

miquelduranfrigola commented 1 year ago

Hello @HellenNamulinda and @GemmaTuron.

Thanks for putting so much effort into this. I think the current workflows are now correct - please look into them carefully to appreciate the changes and updates.

I am now re-running the workflows for this model, let's see if it works. If it does, feel free to close the issue.

https://github.com/ersilia-os/eos9yui/actions

miquelduranfrigola commented 1 year ago

@GemmaTuron - This model seems to be resolved. Please check

HellenNamulinda commented 1 year ago

@GemmaTuron - This model seems to be resolved. Please check

Yes, @miquelduranfrigola and @GemmaTuron, The build was successful for both amd and arm, as seen in the metadata at 22205 amd64; at 19812

#9 205.8 👍 Model eos9yui fetched successfully!
#9 DONE 206.3s

and for arm64 at 22174

#8 8066.3 👍 Model eos9yui fetched successfully!
#8 DONE 8067.5s

ersilia-os / eos9yui

Clean UP & Dockerization of eos9yui #1