Open GemmaTuron opened 2 years ago
OK, this should clearly be a parameter. How many cases do we have, at the moment, that are affected by this constrain?
A while ago I started to work on this problem, addressing it with data augmentation. There are a few tools already implemented in ZairaChem that allow us to do data augmentation, but I haven't incorporated them in the pipeline yet. You can find them in the augmentation
folder.
Overall, I would be happy to explore this possibility, as I think it can be a key aspect of our tool.
I did some testing as to the minimum number of molecules needed to be predictive for a new chemical series. It crashed for 10 and 30 molecules (log attached) and worked for a size of 60 and up.
Is your feature request related to a problem? Please describe. Zairachem cannot run with about less than 60 molecules
Describe the solution you'd like Remove some steps for small datasets.
Describe alternatives you've considered Add a requirement of the minimum number of molecules to train a Zairachem model.