WGLab / Project_Belka

2 stars 0 forks source link

evaluate RDkit and chemprop models #2

Open kaichop opened 3 months ago

kaichop commented 3 months ago

Start from the antibiotic prediction paper and see whether we can reproduce a similar approach of building a prediction model using chemoprop, and then add features extracted from RDkit from all molecules.

wangwpi commented 3 months ago

I'm going to experiment the parameters of morgan fingerprint (radius and bit) to see if we could find the best radius and bit for the morgan fingerprint. This is a good tutorial for the fingerprints from the kaggle challenge: https://www.kaggle.com/code/towardsentropy/fingerprint-tips-and-tricks

Euchiz commented 3 months ago

I've submitted the results from a chemprop-base-1v5 model with 0.388 public score. Still running a similar version that adds building blocks into features. Code uploaded

wangwpi commented 3 months ago

I tested the simple NN model using new morgan fingerprint (radius=4, bits=2048), however it only achived 0.343 public score, which is even lower than using simpler morgan fingerprint (radius=3, bits=1024). We are hitting the ceiling of simple NN model and morgan fingerprint. We could just move on for the GNN or transformer model.

Euchiz commented 3 months ago

I've submitted the results from a chemprop-base-1v5 model with 0.388 public score. Still running a similar version that adds building blocks into features. Code uploaded

I've submitted the results from a chemprop-base-1v5 model with 0.388 public score. Still running a similar version that adds building blocks into features. Code uploaded

Just submitted a 1:1 model with building-block features. Got 0.410 score. Still trying higher ratio (a bit difficult as it requires more ram in one run. considering training on small subsets consecutively).