ersilia-os / zaira-chem

Automated QSAR based on multiple small molecule descriptors
GNU General Public License v3.0
30 stars 11 forks source link

Minimum number of molecules #3

Open GemmaTuron opened 2 years ago

GemmaTuron commented 2 years ago

Is your feature request related to a problem? Please describe. Zairachem cannot run with about less than 60 molecules

Describe the solution you'd like Remove some steps for small datasets.

Describe alternatives you've considered Add a requirement of the minimum number of molecules to train a Zairachem model.

miquelduranfrigola commented 2 years ago

OK, this should clearly be a parameter. How many cases do we have, at the moment, that are affected by this constrain?

miquelduranfrigola commented 2 years ago

A while ago I started to work on this problem, addressing it with data augmentation. There are a few tools already implemented in ZairaChem that allow us to do data augmentation, but I haven't incorporated them in the pipeline yet. You can find them in the augmentation folder.

Overall, I would be happy to explore this possibility, as I think it can be a key aspect of our tool.

JHlozek commented 2 years ago

I did some testing as to the minimum number of molecules needed to be predictive for a new chemical series. It crashed for 10 and 30 molecules (log attached) and worked for a size of 60 and up.

30_train_log.txt