ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
201 stars 128 forks source link

🦠 Model Request: PhaKinPro Incorporation #1114

Closed sucksido closed 2 weeks ago

sucksido commented 4 months ago

Model Name

Pharmacokinetics Profiler (PhaKinPro)

Model Description

Pharmacokinetics Profiler (PhaKinPro) is a recently developed web-based tool that helps predict the pharmacokinetic (PK) properties of drug candidates. In essence, it assists scientists in determining how a drug will behave within the body. Pharmacokinetics refers to the processes by which a drug is absorbed, distributed, metabolized, and excreted (ADME) . Understanding these processes is critical in drug development, as they can affect a drug's efficacy and safety. For example, a drug that is rapidly metabolized may not be effective in the body, while a drug that is slowly excreted may accumulate to toxic levels.

Slug

PhaKinPro

Tag

Microsomal stability, ADME, Metabolism

Publication

https://pubs.acs.org/doi/10.1021/acs.jmedchem.3c02446

Source Code

https://github.com/molecularmodelinglab/PhaKinPro

License

MIT

miquelduranfrigola commented 4 months ago

Hi @sucksido before approving the model we need to add a bit more information to it. You can see any other working model example to get an idea. Feel free to edit the first comment of the issue for a first attempt!

sucksido commented 4 months ago

Hi @miquelduranfrigola , I have added more details, please let know if that is enough

miquelduranfrigola commented 4 months ago

Hi @sucksido description should be longer in and the tags need to be correct. In this case, this model is not related to malaria, for example. @GemmaTuron @Zainab-ik could you offer assistance/guidelines on writing more complete and correct annotation for the model? Then I will be happy to approve it.

GemmaTuron commented 4 months ago

Hi @sucksido

Please use as guidance what is set in other models. The description must have a minimmum of 250 characters, which you can get by reading the abstract of the paper and summarising it I do not understand the links to the publication, the source code or the tags. You are pointing to the MAIP model, not PharKinPro. Please modify the links to the correct ones Also make sure to use the right license

GemmaTuron commented 4 months ago

Hi @sucksido

The information approved for incorporation in the metadata is listed here. For example, ADME and Metabolism would be appropriate tags in this case. In short, use the tags that are approved, and since python is case sensitive we need to, for example, have the MIT License spelled out as MIT only. For the description, remove the space between paragraphs and make it a single one as otherwise it might cause errors. Once these are done we can approve the model

sucksido commented 4 months ago

Hi @GemmaTuron , I have updated it

GemmaTuron commented 4 months ago

@sucksido Please, I listed the appropriate tags for the model - I don't know why you did not add those as well but I have added them.

GemmaTuron commented 4 months ago

/approve

github-actions[bot] commented 4 months ago

New Model Repository Created! 🎉

@sucksido ersilia model respository has been successfully created and is available at:

🔗 ersilia-os/eos39dp

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources 📚

If you have any questions, please feel free to open an issue and get support from the community!

sucksido commented 4 months ago

@GemmaTuron @miquelduranfrigola @JHlozek @DhanshreeA

Findings on PhaKinPro Incorporation: The GitHub repository for PhaKinPro (https://github.com/molecularmodelinglab/PhaKinPro) has seen minimal updates or support in the past two years. Many functionalities within the repository seem to be dysfunctional. Significant modifications to the code were necessary to enable proper functionality. For instance, even the basic command python phakinpro.py --help was non-functional until manual adjustments were made to the file PhaKinPro/PhaKinPro/phakinpro.py. Additionally, the path referencing for the models was incorrect. The primary issue currently being faced is related to incompatible data types in the pickled model data, resulting in failures to load the data or missing model files. These issues act as impediments to testing and hinder the generation of the desired CSV outfile. Attached are both the original and modified versions of phakinpro.py for comparison purposes. Any assistance in resolving these issues would be greatly appreciated.

sucksido commented 4 months ago

it seems like i cant attach .py files here but I have attached on slack

sucksido commented 4 months ago

Current output: [['SMILES', 'Hepatic Stability', 'Microsomal Half-life Sub-cellular', 'Microsomal Half-life Tissue', 'Renal Clearance', 'BBB Permeability', 'CNS Activity', 'CACO2', 'Plasma Protein Binding', 'Plasma Half-life', 'Microsomal Intrinsic Clearance', 'Oral Bioavailability'], ['CCCCCCCC', '', '', '', '', '', '', '', '', '', '', ''], ['CCCCOCNC', '', '', '', '', '', '', '', '', '', '', ''], ['CCCCNCCCOC', '', '', '', '', '', '', '', '', '', '', '']]

@GemmaTuron does this look

GemmaTuron commented 4 months ago

Hi @sucksido

This is looking good, but I think you might be missing the model checkpoints as all the values are empty? This is the format we want to get from the phakinpro.py function, but make sure the results are actually calculated

sucksido commented 4 months ago

@GemmaTuron I have set up a new Ubuntu env, I also have installed VSCode and Miniconda 3, im bust with model incorporation , busy debugging at the moment, still getting some errors at the moment, currently debugging to make sure we are getting reuslts

sucksido commented 4 months ago

Output:

[['SMILES', 'Hepatic Stability', 'Microsomal Half-life Sub-cellular', 'Microsomal Half-life Tissue', 'Renal Clearance', 'BBB Permeability', 'CNS Activity', 'CACO2', 'Plasma Protein Binding', 'Plasma Half-life', 'Microsomal Intrinsic Clearance', 'Oral Bioavailability'], ['COc1c(/C=N/NC(N)=S)c(CC(C)Cl)c(OC)c2c1OCO2', 0.6733, 0.71, 0.56, None, 0.632, 0.8320000000000001, 0.6679999999999999, 0.596, 0.637, 0.54, 0.5413], ['O=N+c1ccc(N/N=C/c2ccc(N+[O-])s2)cc1', 0.6347, 0.74, 0.52, None, 0.72, 0.828, 0.604, 0.524, 0.5820000000000001, 0.66, 0.58], ['CCCCOc1ccc(N2CC(C(=O)NCCc3ccc(OC)c(OC)c3)CC2=O)cc1', None, 0.53, 0.53, None, 0.604, 0.88, 0.584, 0.516, 0.624, 0.564, 0.54], ['O=C(Nc1cc(Cl)ccc1O)c1ccco1', 0.5533, 0.7, 0.8, None, 0.54, 0.892, 0.688, 0.7559999999999999, None, 0.708, 0.536]]

sucksido commented 4 months ago

Command to run inside: eos39dp/model/framework/ : bash run.sh . ~/test.csv ~/out.csv

sucksido commented 4 months ago

test.csv Test SMILES

GemmaTuron commented 3 months ago

Hi @sucksido

I've provided feedback on the PR. The automated tests should pass - they provide detailed info on why are they failing, so have a look to amend. Also while having a look I have noticed that most of the metadata file is empty. You need to fill in all the required fields before pushing the code or the model cannot be appropriately documented and the tests will fail. In GitHub there is a detailed section on what each metadata entry means, and you can also look at other models for inspiration. thanks

sucksido commented 3 months ago

Hi @GemmaTuron noted thanks, I will fix

GemmaTuron commented 2 months ago

I have fixed the model dependencies in https://github.com/ersilia-os/eos39dp/commits/main/

But there is still work to be done:

miquelduranfrigola commented 2 months ago

Thanks @GemmaTuron - do we have an assignee?

kurysauce commented 3 weeks ago

CLOSING COMMENT: Verified structural integrity of model repository and main.py script along with function calls. Concerns are raised with the models output: confidence percentages vs. classification levels. From #internship slack thread:

The outputs captured are confidence percentages of classification levels based on 11 Pharmacokinetic Properties. The original phakinpro.py file contains a CLASSIFICATION_DICT dictionary containing the int classification_level and the string interpretation. Then, there is a confidence percentage that describes how confident the classification level is. The raw Ersilia output captures the confidence percentage rather than the classification level. The problem I see with this is users will not find this confidence percentage useful, without seeing the classification level and knowing its string interpretation.

@miquelduranfrigola suggested the following:

If so, we could do some kind of transformation whereby, for example, a 90% confidence for a 0 label would be returned, by convention, as -0.9 and a 90% confidence for a label 1 would be returned as 0.9.

The classifications are binary and multi class. I’ve attached an example here of a multiclass PK parameter Screenshot 2024-08-22 at 7 48 40 AM . The interpretations for this PK parameter show time periods in relation to stability so I’m not sure how this would be concisely represented on the Ersilia output.

miquelduranfrigola commented 3 weeks ago

I understand, this is very useful. We will take it from here!

GemmaTuron commented 2 weeks ago

Thanks @kurysauce for the nice summary I've completed the model incorporation maintaining the multiclass explanations to make it easier for users to understand, following your recommendation. The output is a bit verbose but easier to interpret