Closed JJanowiak closed 3 years ago
Suggested categories for features:
2 csvs
features calculations SMILES per row, feature per column
feature category feature per row, category per column
Contains a very poor image of potential categories: http://datascience.unm.edu/biomed505/Course/Cheminformatics/basic/descs_fingers/molec_descs_fingerprints.htm
categories are:
We could use CDK and a python wrapper To calculate a set of descriptors
SCINE - Molassembler has the ability to do molecular graphs and a bunch of other descriptors. Written in C++ but has a python key bindings available?
Descriptors to be saved as .CSV
Calculated descriptors using built in class for calculating all descriptors : rdkit.ML.Descriptors
Left computer over night to calculate 208 descriptors for 274,978 SMILEs.
which features to be decided. will need to look at MP prediction papers and what's freely and easily available. Try grouping the features into categories based on what they intend to capture, so specific features could be "turned off" to show the functionality of the library.
Will probably split this into multiple issues later.