ersilia-os / zaira-chem

Automated QSAR based on multiple small molecule descriptors
GNU General Public License v3.0
27 stars 10 forks source link

Empty individual descriptors during training #9

Closed JHlozek closed 1 year ago

JHlozek commented 2 years ago

Describe the bug After the most recent commit (ac52f5e), when training a new model, some of the folders for the individual descriptors in /descriptors/<...> are empty. The model still completes training but at predict time it is looking for files that weren't produced and crashes.

One continuous dataset had molbert and mordred calculated and the rest were empty. Then a binary dataset had none of the descriptors calculated.

To Reproduce Steps to reproduce the behavior:

  1. Train a model.

Expected behavior Produce descriptors at training time so that the predict pipeline doesn't crash.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

JHlozek commented 2 years ago

It seems to do the same when I roll back to bdc0a44, at least for the binary dataset case.

It seems that it calculates and stores the descriptors correctly at f9d5a40.

GemmaTuron commented 1 year ago

This issue was due to an issue when calculating chunked inputs with the Ersilia Descriptors. Solved by https://github.com/ersilia-os/ersilia/commit/f005ce1eca2463b890ce1d96c2e5eb09a9fe4b0d