Clean UP & Dockerization eos43at

HellenNamulinda commented 1 year ago

Hello @GemmaTuron, This model returns the pic50 value, which is the cardiotoxicity of small molecules (IC50 in hERG blockade). The pIC50 predictions made by the MPNN model((MPNNPredictor) in this code are primarily determined by the learned parameters of the model and the input features, including global features computed using RDKit functions.

While the model was trained using rdkit 2019, the RDKit version and the number of descriptors it provides do not impact the predictions since the descriptors are not used explicitly in the model. The RDKit functions utilized in this code are mainly for molecular data handling, preprocessing, and calculation of global features(net_utils.py) like Molecular Weight (MolWt), Topological Polar Surface Area (CalcTPSA), logarithm of the partition coefficient; LogP (MolLogP), and the number of hydrogen bond donors (NumHDonors). They are used to compute values that are considered part of the input features(represented as DGL) of the MPNN model.

With a test file; test.csv, the predictions are same for rdkit 2019.09.3, Out22.csv, also on Colab and rdkit 2022.9.5, out19.csv

Let me tag @pittmanriley and @febielin so we can be consistent with the MolGrad models.

Since rdkit 2019 is not installable using pip, I updated it to rdkit 2022. This plus the other changes work using run.sh and within ersilia(43at_cli_output2.csv).

Other changes made

Updated torch from 1.4 to 1.9 in order to support arm64
Removed .dvc folder and files
Added the main license of the repository, as it didn't have.
Defined checkpoints_dir in predict.py, and removed it from arguments
Updated api from predict to run, added meta and also code to remove the temp_dir in service.py

These changes are reflected in the PR created. However, the Model Test on PR failed Model API eos43at:run did not produce an outputDGL does not detect a valid backend option. Which backend would you like to work with? yet all packages install successfully. Successfully installed torch-1.9.0

Detailed error:
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨
Model API eos43at:run did not produce an outputDGL does not detect a valid backend option. Which backend would you like to work with?

Backend choice (pytorch, mxnet or tensorflow): Traceback (most recent call last):
Error message:
  File "/home/runner/eos/repository/eos43at/20230720024131_7E6A30/eos43at/artifacts/framework/predict.py", line 14, in <module>

    from molgrad.net import MPNNPredictor
  File "/home/runner/eos/repository/eos43at/20230720024131_7E6A30/eos43at/artifacts/framework/molgrad/net.py", line 4, in <module>
    from dgl.nn.pytorch import Set2Set
  File "/usr/share/miniconda/envs/eos43at/lib/python3.7/site-packages/dgl/__init__.py", line 8, in <module>
    from .backend import load_backend, backend_name
  File "/usr/share/miniconda/envs/eos43at/lib/python3.7/site-packages/dgl/backend/__init__.py", line 74, in <module>
    load_backend(get_preferred_backend())
  File "/usr/share/miniconda/envs/eos43at/lib/python3.7/site-packages/dgl/backend/__init__.py", line 69, in get_preferred_backend
    backend_name = input("Backend choice (pytorch, mxnet or tensorflow): ").lower()
EOFError: EOF when reading a line

What I observed is that installing packages was done twice, for instance, rdkit at Collecting rdkit==2022.9.5 and Collecting rdkit-pypi It is like there is some issue with ersilia causing this.

GemmaTuron commented 1 year ago

Hi @HellenNamulinda this was on the Airtable side, sorry, nothing we could do about it. I think service is restored. i'll rerun the workflow

GemmaTuron commented 1 year ago

@HellenNamulinda I have updated the workflows but the model is failing at fetch time, could you please check? Thanks!

HellenNamulinda commented 1 year ago

Hi @GemmaTuron,

The error(Model API eos43at:run did not produce an outputDGL does not detect a valid backend option. Which backend would you like to work with?) I was getting was because of a wrong version I had selected for the Deep Graph Library(dgl) I had put RUN pip install dgl==0.4.3 instead of RUN pip install dgl==0.4.3.post2. While this worked locally, it failed in actions. (not sure but could be because of root permissions)

After reading the release notes for 4.3 post2, they had Rolling back interactive backend selection because automation was crashing. (This reverts to the previous behavior of assuming PyTorch when backend is not given).

However, 0.4.3.post2 is very low to be installed on arch linux(ERROR: Could not find a version that satisfies the requirement dgl==0.4.3.post2 (from versions: 1.0.1, 1.1.0, 1.1.1). Unfortunately, choosing from these versions doesn't work for the model code as new errors are introduced(ImportError: cannot import name 'bipartite' from 'dgl') So, there won't be a docker image for arm64.

A PR has been made for this update.

GemmaTuron commented 1 year ago

Thanks for the update Hellen, good job on the versioning issues. We'll go ahead without the ARM64 version then

ersilia-os / eos43at

Clean UP & Dockerization eos43at #1