beringresearch / ivis

Dimensionality reduction in very large datasets using Siamese Networks
https://beringresearch.github.io/ivis/
Apache License 2.0
330 stars 43 forks source link

Modifications to ensure adherence to sklearn API #95

Closed imatheussm closed 3 years ago

imatheussm commented 3 years ago

The issues

On ivis.Ivis.__init__(), some logic is present to handle the callbacks parameter. This resulted in a parameter change upon initialization of the transformer which might cause errors when using certain sklearn tools (e.g., sklearn.base.clone()). Moreover, the sklearn.base.TransformerMixin mixin was not inherited by Ivis despite the presence of ivis.Ivis.transform().

Minimal reproducible example

Environment

A virtual environment was created specifically for this project, wherein all modules described in requirements.txt were installed. My setup runs an up-to-date version of Windows 10 (no WSL).

Runtime

python=3.8.4

Relevant modules

ivis=2.0.3
tensorflow=2.5.0

Code (executed in interpreter)

>>> import ivis
>>> ivis.Ivis()  # Expected to see `Ivis()`, but that is not the case, as `Ivis.callbacks` is changed in `Ivis.__init__()`.
Ivis(callbacks=[])

The proposed changes

According to [the sklearn documentation on developing a custom estimator](), in the constructor,

there should be no logic, not even input validation, and the parameters should not be changed. The corresponding logic should be put where the parameters are used, typically in fit.

To ensure a greater adherence to the sklearn API, any logic present in the constructor should be moved elsewhere. One strategy, which I implemented herein and submit to your scrutiny, is to create a method dedicated to validating the parameters (Ivis._validate_parameters()).

Since Ivis is a transformer, I believe its inheritance of sklearn.base.TransformerMixin is pertinent as well, unless there is some motivation behind not inheriting TransformerMixin that I could not find.

Szubie commented 3 years ago

Thanks for raising this. We definitely need to do more to ensure full compliance with the sklearn API, and these changes should serve as a good first step.

Thanks for your contribution.