WGLab / Project_Belka

2 stars 0 forks source link

RDkit generation of new non-SMILES representation #3

Open kaichop opened 3 weeks ago

kaichop commented 3 weeks ago

Assess different features that can be generated from RDkit.

For example, convert the SMILES to morgan fingerprint as features, and then use a simple neural network to perform prediction. Assess the performance using testing data. Compare the performance with what is reported in kaggle currently so we know how much to improve.

Paste the code here.

wangwpi commented 2 weeks ago

I have generated morgan fingerprint, protein name (one hot encoding) and binds (labels) for all train and validation data as numpy array format, into trunks. Each trunk has 500,000 rows, the data are located in "/mnt/isilon/wang_lab/shared/Belka/analysis/morgan" and "/mnt/isilon/wang_lab/shared/Belka/analysis/morgan_validation"