ersilia-os / eos8osp

GNU General Public License v3.0
0 stars 0 forks source link

Model incorporation steps #3

Open paulinebanye opened 1 year ago

paulinebanye commented 1 year ago

@GemmaTuron

Steps to create and test the HLM Model.

I outlined the problem I encountered while incorporating and testing this model in this issue .

Incorporating and testing the model with the RLM gcnn_model.pt file worked but I found the return quite confusing. In the codebase, probability of >=0.5 is considered unstable and <0.5 is stable, however the probability returned results greater than 0.5 as stable and outputs as high as >7.

100%|███████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  3.74it/s]HLM: 3.005901575088501 seconds to predict 10 molecules
                                              smiles Prediction Predicted Class         Probability
0      Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1     stable              0   0.9513722583651543
1                         CC(=O)Nc1nnc(S(N)(=O)=O)s1     stable              0   0.9999999879992938
2                                            CC(=O)O     stable              0   0.9999999999999524
3                            CC(=O)N[C@@H](CS)C(=O)O     stable              0      0.9999260790064
4                              CC(=O)Oc1ccccc1C(=O)O     stable              0   0.8703196793794632
5                       Nc1nc(=O)c2ncn(COCCO)c2[nH]1     stable              0   0.9993419629754499
6  O=C(O[C@H]1C[N+]2(CCCOc3ccccc3)CCC1CC2)C(O)(c1...   unstable              1   0.9830060601234436
7  CN(C)C/C=C/C(=O)Nc1cc2c(Nc3ccc(F)c(Cl)c3)ncnc2...     stable              0   0.6951150000095367
8                     CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1   unstable              1    0.541957437992096
9                             O=c1ncnc2[nH][nH]cc1-2     stable              0   0.9999941727937767
[0.048627741634845734, 1.2000706206549694e-08, 4.7628567756419216e-14, 7.392099360004067e-05, 0.1296803206205368, 0.0006580370245501399, 0.9830060601234436, 0.30488499999046326, 0.541957437992096, 5.827206223329995e-06]

I decited to return the probability and I took a few steps to extract the values of the probability to generate a csv with a list of smiles which involved:

On testing on the local repo and within the Ersilia CLI, the results returned are similar to the values returned when the model was tested with the original NCAT code.

GemmaTuron commented 1 year ago

@pauline-banye Do not mix models please, this is quite dangerous as at some point you might swap models completely without realizing. We troubleshooted this together for the first model, you recall? We need to convert all the outputs to probability of 1, you are now having a mix of probability of 0 and probability of 1. This is precisely what I was concerned about in the solubility model.

paulinebanye commented 1 year ago

@pauline-banye Do not mix models please, this is quite dangerous as at some point you might swap models completely without realizing. We troubleshooted this together for the first model, you recall? We need to convert all the outputs to probability of 1, you are now having a mix of probability of 0 and probability of 1. This is precisely what I was concerned about in the solubility model.

Hi @GemmaTuron, The problem is actually with the model file itself. There seems to be a problem with the actual gcnn_model.pt file.