ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
203 stars 131 forks source link

🦠 Model Request: Prediction of Aqueous Kinetic Solubility #589

Closed paulinebanye closed 1 year ago

paulinebanye commented 1 year ago

Model Name

Aqueous Kinetic Solubility

Model Description

Prediction of Aqueous solubility is one of the most important properties in drug discovery, as it has profound impact on various drug properties, including biological activity, pharmacokinetics (PK), toxicity, and in vivo efficacy.

Slug

aqueous-kinetic-solubility

Tag

solubility, ADME

Publication

https://pubmed.ncbi.nlm.nih.gov/31176566/

Source Code

https://github.com/ncats/ncats-adme

License

None

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! 🎉

@pauline-banye ersilia model respository has been successfully created and is available at:

🔗 ersilia-os/eos74bo

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources 📚

If you have any questions, please feel free to open an issue and get support from the community!

GemmaTuron commented 1 year ago

Hi @pauline-banye can you update me? We are moving the solubility model to this new repo finally to bypass the git-lfs installs? Let me know when the old repo is ready for deleting to avoid duplications!

paulinebanye commented 1 year ago

Hi @GemmaTuron Yes, you can delete the old repo. I have all the codes and I have forked and transferred the codes to the newly cloned repo.

However, I seem to be having an issue with FPSim2 but I believe it is due to my conda environment. When I encountered this issue, I tested the model with a virtual environment on a different terminal (gitbash) and it worked flawlessly.

I am currently trying to resolve the issues with the FPSim2 dependency on my conda environment.

GemmaTuron commented 1 year ago

@pauline-banye following our discussion:

paulinebanye commented 1 year ago

My apologies @GemmaTuron , it is supposed to be high solubility and low solubility. I was remiss in editing that string in the repo but I have corrected it.

paulinebanye commented 1 year ago

Thank you @GemmaTuron. I deleted the previous repo and all conda environments. The new fork and environment has been upgraded to python 3.8

GemmaTuron commented 1 year ago

Perfect, thanks @pauline-banye ! Let me know how it goes with the changes!

paulinebanye commented 1 year ago

Hi @GemmaTuron I'm still getting issues with the checks but after time spent debugging, I believe it could be an issue with the python path because although the dependencies are installed, it's still returning that module not found error.

GemmaTuron commented 1 year ago

Hi @pauline-banye Let's try to debug this today, I was able to run it successfully in my system - can you get ready

paulinebanye commented 1 year ago

Hi @pauline-banye Let's try to debug this today, I was able to run it successfully in my system - can you get ready

  • List of dependencies you are installing - either manually or through .yml
  • List of dependencies you see in your conda env (with versions)
  • Python path of the conda environment

Hi @GemmaTuron, Thank you so much! I would really appreciate it 🙏. I Have pushed the current dependencies to the forked repository but I need to export the dependencies from the current environment. I would send an update once I have updated the repo.

paulinebanye commented 1 year ago

Hi @GemmaTuron ;

As requested, I returned the exact values from the solubility model. These tests were carried out using two different lists of smiles.

GemmaTuron commented 1 year ago

Hi @pauline-banye! Thanks for this. I am a bit confused because each file has different names, so I don't know which result corresponds to what. I want to make sure that we are always returning the probability of 1 Can you let me know what do you get when predicting the following molecules: CC(=O)Oc1ccccc1C(=O)O and CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1 with the Ersilia Repo and with the Original code?

Thanks!

GemmaTuron commented 1 year ago

Looking a bit more into the results, now I understand we are giving as output the latest line I see on the code, for example: [0.0, 0.0, 0.79, 0.0, 0.020000000000000018, 0.0, 0.010000000000000009, 0.26, 0.99, 0.95] in the last one right?

so that should be fine

GemmaTuron commented 1 year ago

Two comments just to close this: Can we name the column: "proba1" instead of "value" in the Ersilia Model Hub output? And could the output have a SMILES column + Proba1 column? for easier interpretation of the results

paulinebanye commented 1 year ago

Two comments just to close this: Can we name the column: "proba1" instead of "value" in the Ersilia Model Hub output? And could the output have a SMILES column + Proba1 column? for easier interpretation of the results

Hi @GemmaTuron, I noticed that the results in the printed output differed to what was returned as the probability. I felt it was misleading so I decided to resolve this and return the exact figures in the output. To do this, I had to extract the data and these were the steps I performed:


[eml_output.csv](https://github.com/ersilia-os/ersilia/files/10743474/eml_output.csv)
paulinebanye commented 1 year ago

Hi @pauline-banye! Thanks for this. I am a bit confused because each file has different names, so I don't know which result corresponds to what. I want to make sure that we are always returning the probability of 1 Can you let me know what do you get when predicting the following molecules: CC(=O)Oc1ccccc1C(=O)O and CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1 with the Ersilia Repo and with the Original code?

Thanks!

@GemmaTuron I'm so sorry I made it confusing, I was a bit overzealous and tried to account for all the possible test cases. Let me attempt to clarify.

This is the output from Ersilia when tested with CC(=O)Oc1ccccc1C(=O)O and CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1. eos74bo_run.csv

paulinebanye commented 1 year ago

Two comments just to close this: Can we name the column: "proba1" instead of "value" in the Ersilia Model Hub output? And could the output have a SMILES column + Proba1 column? for easier interpretation of the results

@GemmaTuron Ersilia actually already accounts for the smile name in the output returned. I did try it out as you suggested but it returned errors which would require possibly editing the Ersilia codebase. Below is the error that I encountered when I added the smile name as part of the result.

File "/home/pauline/ersilia/ersilia/cli/commands/api.py", line 37, in api
    api_name=api_name, input=input, output=output, batch_size=batch_size
  File "/home/pauline/ersilia/ersilia/core/model.py", line 343, in api
    api_name=api_name, input=input, output=output, batch_size=batch_size
  File "/home/pauline/ersilia/ersilia/core/model.py", line 357, in api_task
    for r in result:
  File "/home/pauline/ersilia/ersilia/core/model.py", line 184, in _api_runner_iter
    for result in api.post(input=input, output=output, batch_size=batch_size):
  File "/home/pauline/ersilia/ersilia/serve/api.py", line 330, in post
    results, output, model_id=self.model_id, api_name=self.api_name
  File "/home/pauline/ersilia/ersilia/io/output.py", line 283, in adapt
    df = self._to_dataframe(result)
  File "/home/pauline/ersilia/ersilia/io/output.py", line 229, in _to_dataframe
    output_keys_expanded = self.__expand_output_keys(vals, output_keys)
  File "/home/pauline/ersilia/ersilia/io/output.py", line 197, in __expand_output_keys
    t = self._guess_pure_dtype_if_absent(v)
  File "/home/pauline/ersilia/ersilia/io/output.py", line 181, in _guess_pure_dtype_if_absent
    return dtype["type"]
TypeError: 'NoneType' object is not subscriptable
GemmaTuron commented 1 year ago

Thanks for testing! @miquelduranfrigola qhat do you think? should we just give the number as output?

GemmaTuron commented 1 year ago

This model is completed.