OpenSourceMalaria / Series4_PredictiveModel

Can we Predict Active Compounds in OSM Series 4?
7 stars 10 forks source link

Add submission by jonjoncardoso #10

Closed jonjoncardoso closed 5 years ago

jonjoncardoso commented 5 years ago

Hi!

Here is my contribution to the competition.

I have used the following approach:

  1. Acquired master training data as compiled by wvanhoorn (https://github.com/OpenSourceMalaria/Series4_PredictiveModel/issues/1#issuecomment-523037719)
  2. Calculate molecular descriptors using CDK
  3. Filter out uninformative descriptors (near zero variance + highly correlated to other descriptor)
  4. Handle duplicated entries (remove compounds if standard deviation of activity is too high, keep the median value otherwise)
  5. Represent compounds as a network and separate them in groups/modules
  6. Apply Machine Learning models to each module