Closed paulinebanye closed 1 year ago
/approve
@pauline-banye ersilia model respository has been successfully created and is available at:
Now that your new model respository has been created, you are ready to start contributing to it!
Here are some brief starter steps for contributing to your new model repository:
Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository
README.md
file to accurately describe your modelIf you have any questions, please feel free to open an issue and get support from the community!
Hi @pauline-banye can you update me? We are moving the solubility model to this new repo finally to bypass the git-lfs installs? Let me know when the old repo is ready for deleting to avoid duplications!
Hi @GemmaTuron Yes, you can delete the old repo. I have all the codes and I have forked and transferred the codes to the newly cloned repo.
However, I seem to be having an issue with FPSim2 but I believe it is due to my conda environment. When I encountered this issue, I tested the model with a virtual environment on a different terminal (gitbash) and it worked flawlessly.
I am currently trying to resolve the issues with the FPSim2 dependency on my conda environment.
@pauline-banye following our discussion:
My apologies @GemmaTuron , it is supposed to be high solubility and low solubility. I was remiss in editing that string in the repo but I have corrected it.
Thank you @GemmaTuron. I deleted the previous repo and all conda environments. The new fork and environment has been upgraded to python 3.8
Perfect, thanks @pauline-banye ! Let me know how it goes with the changes!
Hi @GemmaTuron I'm still getting issues with the checks but after time spent debugging, I believe it could be an issue with the python path because although the dependencies are installed, it's still returning that module not found error.
Hi @pauline-banye Let's try to debug this today, I was able to run it successfully in my system - can you get ready
Hi @pauline-banye Let's try to debug this today, I was able to run it successfully in my system - can you get ready
- List of dependencies you are installing - either manually or through .yml
- List of dependencies you see in your conda env (with versions)
- Python path of the conda environment
Hi @GemmaTuron, Thank you so much! I would really appreciate it 🙏. I Have pushed the current dependencies to the forked repository but I need to export the dependencies from the current environment. I would send an update once I have updated the repo.
Hi @GemmaTuron ;
As requested, I returned the exact values from the solubility model. These tests were carried out using two different lists of smiles.
[x] Full output without the round command eml_sol_full.csv input_sol_full.csv
[x] I repeated the model test within the Ersilia CLI using the repo_path command. eos74bo_list_run.csv eos74bo_run.csv
[x] I also compared the output recieved from NCAT with the output returned by the eos74bo solubility model rounded to two decimal places.
Output from original codes from NCAT. eml_sol_ADME_Predictions_2023-02-14-115722.csv input.sol_ADME_Predictions_2023-02-14-121509.csv
Output from eos74bo (with the results rounded to two decimal places)
output_df from input.csv
Solubility: 0.08900904655456543 seconds to predict 11 molecules
smiles Predicted Class (Probability) Prediction
0 CC(=O)Nc1nnc(S(N)(=O)=O)s1 0 (1.0) high solubility
1 CCCOCCCCCCC 1 (1.0) low solubility
2 CCCCOCCCC 1 (0.74) low solubility
3 CC(=O)N[C@@H](CS)C(=O)O 0 (1.0) high solubility
4 CC(=O)Oc1ccccc1C(=O)O 0 (0.98) high solubility
5 CC(=O)O 1 (0.79) low solubility
6 O=c1ncnc2[nH][nH]cc1-2 1 (0.95) low solubility
7 CCCCNCCCCC 0 (1.0) high solubility
8 Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 0 (1.0) high solubility
9 CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1 1 (0.99) low solubility
10 C[C@]12CC[C@H]3[C@@H](CC=C4C[C@@H](O)CC[C@@]43... 0 (0.86) high solubility
[0.0, 1.0, 0.74, 0.0, 0.020000000000000018, 0.79, 0.95, 0.0, 0.0, 0.99, 0.14]
output_df from eml.csv
Solubility: 0.5354588031768799 seconds to predict 10 molecules
smiles Predicted Class (Probability) Prediction
0 Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 0 (1.0) high solubility
1 CC(=O)Nc1nnc(S(N)(=O)=O)s1 0 (1.0) high solubility
2 CC(=O)O 1 (0.79) low solubility
3 CC(=O)N[C@@H](CS)C(=O)O 0 (1.0) high solubility
4 CC(=O)Oc1ccccc1C(=O)O 0 (0.98) high solubility
5 Nc1nc(=O)c2ncn(COCCO)c2[nH]1 0 (1.0) high solubility
6 O=C(O[C@H]1C[N+]2(CCCOc3ccccc3)CCC1CC2)C(O)(c1... 0 (0.99) high solubility
7 CN(C)C/C=C/C(=O)Nc1cc2c(Nc3ccc(F)c(Cl)c3)ncnc2... 0 (0.74) high solubility
8 CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1 1 (0.99) low solubility
9 O=c1ncnc2[nH][nH]cc1-2 1 (0.95) low solubility
[0.0, 0.0, 0.79, 0.0, 0.020000000000000018, 0.0, 0.010000000000000009, 0.26, 0.99, 0.95]
Hi @pauline-banye!
Thanks for this. I am a bit confused because each file has different names, so I don't know which result corresponds to what. I want to make sure that we are always returning the probability of 1
Can you let me know what do you get when predicting the following molecules: CC(=O)Oc1ccccc1C(=O)O
and CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
with the Ersilia Repo and with the Original code?
Thanks!
Looking a bit more into the results, now I understand we are giving as output the latest line I see on the code, for example: [0.0, 0.0, 0.79, 0.0, 0.020000000000000018, 0.0, 0.010000000000000009, 0.26, 0.99, 0.95] in the last one right?
so that should be fine
Two comments just to close this: Can we name the column: "proba1" instead of "value" in the Ersilia Model Hub output? And could the output have a SMILES column + Proba1 column? for easier interpretation of the results
Two comments just to close this: Can we name the column: "proba1" instead of "value" in the Ersilia Model Hub output? And could the output have a SMILES column + Proba1 column? for easier interpretation of the results
Hi @GemmaTuron, I noticed that the results in the printed output differed to what was returned as the probability. I felt it was misleading so I decided to resolve this and return the exact figures in the output. To do this, I had to extract the data and these were the steps I performed:
Solubility: 0.11705160140991211 seconds to predict 10 molecules
smiles Predicted Class (Probability) Prediction proba1
0 Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1 0 (0.9992302207974717) high solubility 0.000770
1 CC(=O)Nc1nnc(S(N)(=O)=O)s1 0 (0.9997545259248) high solubility 0.000245
2 CC(=O)O 1 (0.7889353036880493) low solubility 0.788935
3 CC(=O)N[C@@H](CS)C(=O)O 0 (0.9989036553306505) high solubility 0.001096
4 CC(=O)Oc1ccccc1C(=O)O 0 (0.9783817026764154) high solubility 0.021618
5 Nc1nc(=O)c2ncn(COCCO)c2[nH]1 0 (0.9998792609330849) high solubility 0.000121
6 O=C(O[C@H]1C[N+]2(CCCOc3ccccc3)CCC1CC2)C(O)(c1... 0 (0.9916039435192943) high solubility 0.008396
7 CN(C)C/C=C/C(=O)Nc1cc2c(Nc3ccc(F)c(Cl)c3)ncnc2... 0 (0.7405502796173096) high solubility 0.259450
8 CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1 1 (0.9934565424919128) low solubility 0.993457
9 O=c1ncnc2[nH][nH]cc1-2 1 (0.9541448950767517) low solubility 0.954145
[eml_output.csv](https://github.com/ersilia-os/ersilia/files/10743474/eml_output.csv)
Hi @pauline-banye! Thanks for this. I am a bit confused because each file has different names, so I don't know which result corresponds to what. I want to make sure that we are always returning the probability of 1 Can you let me know what do you get when predicting the following molecules:
CC(=O)Oc1ccccc1C(=O)O
andCCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
with the Ersilia Repo and with the Original code?Thanks!
@GemmaTuron I'm so sorry I made it confusing, I was a bit overzealous and tried to account for all the possible test cases. Let me attempt to clarify.
This is the output from Ersilia when tested with CC(=O)Oc1ccccc1C(=O)O
and CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
.
eos74bo_run.csv
Two comments just to close this: Can we name the column: "proba1" instead of "value" in the Ersilia Model Hub output? And could the output have a SMILES column + Proba1 column? for easier interpretation of the results
@GemmaTuron Ersilia actually already accounts for the smile name in the output returned. I did try it out as you suggested but it returned errors which would require possibly editing the Ersilia codebase. Below is the error that I encountered when I added the smile name as part of the result.
File "/home/pauline/ersilia/ersilia/cli/commands/api.py", line 37, in api
api_name=api_name, input=input, output=output, batch_size=batch_size
File "/home/pauline/ersilia/ersilia/core/model.py", line 343, in api
api_name=api_name, input=input, output=output, batch_size=batch_size
File "/home/pauline/ersilia/ersilia/core/model.py", line 357, in api_task
for r in result:
File "/home/pauline/ersilia/ersilia/core/model.py", line 184, in _api_runner_iter
for result in api.post(input=input, output=output, batch_size=batch_size):
File "/home/pauline/ersilia/ersilia/serve/api.py", line 330, in post
results, output, model_id=self.model_id, api_name=self.api_name
File "/home/pauline/ersilia/ersilia/io/output.py", line 283, in adapt
df = self._to_dataframe(result)
File "/home/pauline/ersilia/ersilia/io/output.py", line 229, in _to_dataframe
output_keys_expanded = self.__expand_output_keys(vals, output_keys)
File "/home/pauline/ersilia/ersilia/io/output.py", line 197, in __expand_output_keys
t = self._guess_pure_dtype_if_absent(v)
File "/home/pauline/ersilia/ersilia/io/output.py", line 181, in _guess_pure_dtype_if_absent
return dtype["type"]
TypeError: 'NoneType' object is not subscriptable
Thanks for testing! @miquelduranfrigola qhat do you think? should we just give the number as output?
This model is completed.
Model Name
Aqueous Kinetic Solubility
Model Description
Prediction of Aqueous solubility is one of the most important properties in drug discovery, as it has profound impact on various drug properties, including biological activity, pharmacokinetics (PK), toxicity, and in vivo efficacy.
Slug
aqueous-kinetic-solubility
Tag
solubility, ADME
Publication
https://pubmed.ncbi.nlm.nih.gov/31176566/
Source Code
https://github.com/ncats/ncats-adme
License
None