ersilia-os / zaira-chem

Automated QSAR based on multiple small molecule descriptors
GNU General Public License v3.0
27 stars 10 forks source link

Anonymization of models for deployment #22

Closed GemmaTuron closed 1 year ago

GemmaTuron commented 1 year ago

Is your feature request related to a problem? Please describe. Eliminate all information related to the training set so that models trained with proprietary data can be released

Additional context We are working on this in the dev branch. The latest commit ensures the anonymization process works when run at the end of model training, and new molecules can be predicted, both test sets (with associated data, hence, reports are created) as well as de novo molecules.

We only need to add a flag for the cli to run the anonymization at the end of the training. Currently, I simply run

from zairachem.finish.finisher import Anonymizer

path = "path_to_model"
an = Anonymizer(path=path)
an.run()

Dev will be merged once the changes are incorporated

GemmaTuron commented 1 year ago

We need to revise the umap and pca because the train set still appears-there is one file not deleted

GemmaTuron commented 1 year ago

This PR #23 solves these issues, and provides an --anonymize option at fit time