WGLab / Project_Belka

2 stars 0 forks source link

The inconsistancy of the batch files #9

Open wangwpi opened 1 week ago

wangwpi commented 1 week ago

A inconsistant positive binding percentage was found in our previous numpy array file, the large discrepancy might lead to bad model performance. Umair has helped reshuffule the original dataset, and now I'm spliting the shuffuled data into a new train file and validation file. Then I will regenerate the batch files containing the morgan fingerprint. TODO: after the new batch files are generated, we need to check the files and make sure the inconsistancy is not accured. (A little bit of variation is fine and probably good for the model, but large variation is not prefered.) And see a more balanced training batch files could improve the model perforamnce or not.

wangwpi commented 1 week ago