In the data folder there are all_molecules_$MODEL_ID.csv files that I calculated
I suggest we do the following plots using a Jupyter Notebook:
A histogram or density plot showing NP-likeness score of NP vs SD compounds (two distributions). There are several scores, use the one that gives a more discriminative result.
A histogram or density plot showing SA of NP vs SD compounds (two distributions). There are several scores, use the one that gives a more discriminative result.
In the ADME properties results, take the DrugBank-scaled results and show a few representative columns. There are many outputs (~40), so we'll have to filter. We can think of an ADME panel inspired by this one.
A UMAP or tSNE plot of one of the descriptors (WHALES?), potentially coloring by (a) NP vs SD or (b) target categories as provided by Fidele.
As you will see, there is also a MOE file with calculated properties by Fidele. In that case, I would use the PCA components to provide a PCA plot (2D or 3D).
Fidele will calculate scaffolds using MOE. We may want to include a representation of the most popular scaffolds.
⚠️ Please note that Fidele has updated some files and, therefore, I will have to recalculate everything. Files will remain in the same format, so in principle this should not be a big problem.
Hi @GemmaTuron,
In the
data
folder there areall_molecules_$MODEL_ID.csv
files that I calculatedI suggest we do the following plots using a Jupyter Notebook:
⚠️ Please note that Fidele has updated some files and, therefore, I will have to recalculate everything. Files will remain in the same format, so in principle this should not be a big problem.