ATOMScience-org / AMPL

The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.
MIT License
136 stars 68 forks source link

Fixing is_class Determination in featurization.py #354

Closed rwilfong closed 3 months ago

rwilfong commented 3 months ago

Hi, when running a multitask classification model, the NaNs in out_vals were not being replaced with zeros in the make_weights function in featurization.py. The root cause was that the is_class variable was incorrectly determined based on params.model_type instead of params.prediction_type, causing it to always return False and consequently not replacing the NaNs in out_vals.   I corrected the is_class assignments in featurization.py to ensure they reflect the prediction_type. I ran the multitask classification tests for atomsci/ddm/test/integrative/delaney_Panel, which initially passed even without the changes. However, after closer inspection, I found that there were no NaNs in the response columns of test data, which is why the tests passed. When I introduced NaNs into the response columns, the models failed to train with the old code. After applying the is_class fix, the models trained successfully.