Closed stewarthe6 closed 2 months ago
From Kevin's Google Sheet comments:
chem_diversity.py: a. Function _get_descriptors is deprecated and "is guaranteed not to work"; it should be deleted. b. The only logic change from Ruff is in function upload_distmatrix_to_DS, which could never have run, so clearly it has not been used in the 5+ years since it was written. Recommend we delete it.
temporal_splitter.py a. "self.date_col not in attr_df.columns.values" is logically equivalent to "not (self.date_col in attr_df.columns.values), because of parentheses. So no actual logic changes in this file.
transformations.py a. Two errors related to UMAPTransformer:
Note:: There used to be an sklearn,preprocessing.Imputer class, but it was replaced by sklearn.impute.SimpleImputer in sklearn 0.20. So it appears that we changed the import but forgot to change the code where the imputer gets invoked. Therefore UMAPTransformer would not have worked at all since these sklearn changes took place.
Recommend we deprecate UMAPTransformer now and remove it entirely in a future release.
The other logic error is in the NormalizationTransformerMissingData class:
- "get_grad_statistics" is not defined:
This function is only called when the class is instantiated with transform_gradients=True
, which never happens. transform_gradients is a parameter from the init method of the parent DeepChem NormalizationTransformer class. From the comments in the current DeepChem code, it looks like this parameter was designed for a special case that is deprecated in the current version and will be removed in a future version. So we should also remove this parameter and the associated code block.
Raised issue for fixes: https://github.com/ATOMScience-org/AMPL/issues/351. Issue fixed and closed.
When running unit tests, found these:
================================================ short test summary info ================================================= FAILED test_DatasetManager.py::test_DatasetManager_doesnot_needs_smiles - AssertionError: Expecting [[1 1 0 0] FAILED test_DatasetManager.py::test_DatasetManager_needs_smiles - AssertionError: Expecting [[1 1 0 0] FAILED test_DatasetManager.py::test_DatasetManager_many_to_one - AssertionError: Expecting [[1 1 1 0] ================================= 3 failed, 91 passed, 39 warnings in 2782.66s (0:46:22) =================================
The issue was from splitting.py - lambda-assignment (E731).
However, the new w_agg_func output result is not what’s expected. The compact_dataset.w became
E AssertionError: Expecting [[1 1 0 0] E [0 0 1 0] E [0 0 0 1]]. Instead generated [[None None None None] E [None None None None] E [None None None None]]
We either change the function or modify the ruff.toml to ignore lambda-assignment (E731),
I fixed the issue here https://github.com/ATOMScience-org/AMPL/pull/348#issuecomment-2274092727.
The w_agg_function didn't return the result. After returning the result it passed the test_DatasetManager.py test.
@stewarthe6 @mcloughlin2 I added individual comments at the relevant line of code to logic changes that might need tests. Hopefully you didn't receive a million individual notifications for those.
Herman and his team at the hackathon found many logical bugs and instances of un-executable code along with unused variables and imports. This PR address those changes, adds new tests, and adds the Ruff linter to our workflow.