ersilia-os / sars-cov-2-chemspace

This repository explores the chemical space associated with SARS-CoV-2 according to manually curated data at UB-CeDD
GNU General Public License v3.0
0 stars 0 forks source link

Figure 1: chemical space analysis of manually curated compounds #2

Closed miquelduranfrigola closed 3 weeks ago

miquelduranfrigola commented 1 month ago

Hi @GemmaTuron,

In the data folder there are all_molecules_$MODEL_ID.csv files that I calculated

I suggest we do the following plots using a Jupyter Notebook:

  1. A histogram or density plot showing NP-likeness score of NP vs SD compounds (two distributions). There are several scores, use the one that gives a more discriminative result.
  2. A histogram or density plot showing SA of NP vs SD compounds (two distributions). There are several scores, use the one that gives a more discriminative result.
  3. In the ADME properties results, take the DrugBank-scaled results and show a few representative columns. There are many outputs (~40), so we'll have to filter. We can think of an ADME panel inspired by this one.
  4. A UMAP or tSNE plot of one of the descriptors (WHALES?), potentially coloring by (a) NP vs SD or (b) target categories as provided by Fidele.
  5. As you will see, there is also a MOE file with calculated properties by Fidele. In that case, I would use the PCA components to provide a PCA plot (2D or 3D).
  6. Fidele will calculate scaffolds using MOE. We may want to include a representation of the most popular scaffolds.

⚠️ Please note that Fidele has updated some files and, therefore, I will have to recalculate everything. Files will remain in the same format, so in principle this should not be a big problem.

GemmaTuron commented 1 month ago

Hi @miquelduranfrigola

I have updated the molecules to the last version and recalculated the Ersilia Features. In the notebook you can find the NP score plots as well as some UMAPs and TSNE. I've only left the best ones. I am now playing with the ADME panel, but will not have anything until Wednesday or Thursday I have not ploted the PCA because we need the updated MOE files.

miquelduranfrigola commented 1 month ago

Fantastic, this is great @GemmaTuron

GemmaTuron commented 1 month ago

I have made a temptative on the ADME panel. I think 49 are too many properties, we should reduce to 20 max? The percentile of drugbank approved is the easiest measure to plot for all of them.

The list is: all_props = [ "molecular_weight", "logP", "hydrogen_bond_acceptors", "hydrogen_bond_donors", "Lipinski", "QED", "stereo_centers", "tpsa", "AMES", "BBB_Martins", "Bioavailability_Ma", "CYP1A2_Veith", "CYP2C19_Veith", "CYP2C9_Substrate_CarbonMangels", "CYP2C9_Veith", "CYP2D6_Substrate_CarbonMangels", "CYP2D6_Veith", "CYP3A4_Substrate_CarbonMangels", "CYP3A4_Veith", "Carcinogens_Lagunin", "ClinTox", "DILI", "HIA_Hou", "NR-AR-LBD", "NR-AR", "NR-AhR", "NR-Aromatase", "NR-ER-LBD", "NR-ER", "NR-PPAR-gamma", "PAMPA_NCATS", "Pgp_Broccatelli", "SR-ARE", "SR-ATAD5", "SR-HSE", "SR-MMP", "SR-p53", "Skin_Reaction", "hERG", "Caco2_Wang", "Clearance_Hepatocyte_AZ", "Clearance_Microsome_AZ", "Half_Life_Obach", "HydrationFreeEnergy_FreeSolv", "LD50_Zhu", "Lipophilicity_AstraZeneca", "PPBR_AZ", "Solubility_AqSolDB", "VDss_Lombardo" ]

miquelduranfrigola commented 1 month ago

Sure. I suggest, at least:

GemmaTuron commented 1 month ago

you can find all the properties in the figures/adme_props folder - by Monday we should have the list of the 20 selected ones.

miquelduranfrigola commented 1 month ago

These ones I like. Please add yours:

  1. BBB Martins
  2. Bioavailability
  3. CYP2C9 Veith
  4. CYP3A4 Veith
  5. Carginogens Langunin
  6. Clearance Hepatocyte
  7. DILI
  8. HIA Hou (absorption in the intestine)
  9. NR-AR-LBD
  10. NR-PPAR-gamma
  11. SR-ARE
  12. Skin reaction
  13. Solubility AqSolDB
  14. hERG
  15. logP
  16. Molecular weight

Also, I am surprised about the stereocenters distribution...

miquelduranfrigola commented 1 month ago

Hi @GemmaTuron I would like to try this model for chemical space visualization: eos39co

miquelduranfrigola commented 3 weeks ago

@GemmaTuron can we close this issue?

GemmaTuron commented 3 weeks ago

yes this is complete. I need to rerun everything with the new datasets though