ATOMScience-org / AMPL

The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.
MIT License
136 stars 68 forks source link

Correcting code after using Ruff linter #348

Closed stewarthe6 closed 2 months ago

stewarthe6 commented 3 months ago

Herman and his team at the hackathon found many logical bugs and instances of un-executable code along with unused variables and imports. This PR address those changes, adds new tests, and adds the Ruff linter to our workflow.

mauvais2 commented 3 months ago

From Kevin's Google Sheet comments:

  1. chem_diversity.py: a. Function _get_descriptors is deprecated and "is guaranteed not to work"; it should be deleted. b. The only logic change from Ruff is in function upload_distmatrix_to_DS, which could never have run, so clearly it has not been used in the 5+ years since it was written. Recommend we delete it.

  2. temporal_splitter.py a. "self.date_col not in attr_df.columns.values" is logically equivalent to "not (self.date_col in attr_df.columns.values), because of parentheses. So no actual logic changes in this file.

  3. transformations.py a. Two errors related to UMAPTransformer:

    1. SimpleImputer imported but not used
    2. Imputer is not defined

Note:: There used to be an sklearn,preprocessing.Imputer class, but it was replaced by sklearn.impute.SimpleImputer in sklearn 0.20. So it appears that we changed the import but forgot to change the code where the imputer gets invoked. Therefore UMAPTransformer would not have worked at all since these sklearn changes took place.

Recommend we deprecate UMAPTransformer now and remove it entirely in a future release.

The other logic error is in the NormalizationTransformerMissingData class:

  • "get_grad_statistics" is not defined:

This function is only called when the class is instantiated with transform_gradients=True, which never happens. transform_gradients is a parameter from the init method of the parent DeepChem NormalizationTransformer class. From the comments in the current DeepChem code, it looks like this parameter was designed for a special case that is deprecated in the current version and will be removed in a future version. So we should also remove this parameter and the associated code block.

  1. data_curation_functions.py a. Only potential logic change is in "is_organometallic" function; Ruff flags this as "Test for membership should be 'not in'". b. In this case the boolean expression to the right of the "not" is in parentheses, so there is no actual logic change.

Raised issue for fixes: https://github.com/ATOMScience-org/AMPL/issues/351. Issue fixed and closed.

mauvais2 commented 3 months ago

When running unit tests, found these:

================================================ short test summary info ================================================= FAILED test_DatasetManager.py::test_DatasetManager_doesnot_needs_smiles - AssertionError: Expecting [[1 1 0 0] FAILED test_DatasetManager.py::test_DatasetManager_needs_smiles - AssertionError: Expecting [[1 1 0 0] FAILED test_DatasetManager.py::test_DatasetManager_many_to_one - AssertionError: Expecting [[1 1 1 0] ================================= 3 failed, 91 passed, 39 warnings in 2782.66s (0:46:22) =================================

The issue was from splitting.py - lambda-assignment (E731).

spliting_finding

However, the new w_agg_func output result is not what’s expected. The compact_dataset.w became

E AssertionError: Expecting [[1 1 0 0] E [0 0 1 0] E [0 0 0 1]]. Instead generated [[None None None None] E [None None None None] E [None None None None]]

We either change the function or modify the ruff.toml to ignore lambda-assignment (E731),

stewarthe6 commented 3 months ago

I fixed the issue here https://github.com/ATOMScience-org/AMPL/pull/348#issuecomment-2274092727.

The w_agg_function didn't return the result. After returning the result it passed the test_DatasetManager.py test.

paulsonak commented 3 months ago

@stewarthe6 @mcloughlin2 I added individual comments at the relevant line of code to logic changes that might need tests. Hopefully you didn't receive a million individual notifications for those.