EpistasisLab / Aliro

Aliro: AI-Driven Data Science
https://epistasislab.github.io/Aliro
GNU General Public License v3.0
223 stars 63 forks source link

Add OneHotEncoder and OrdinalEncoder #588

Open jay-m-dev opened 1 year ago

jay-m-dev commented 1 year ago

Add a OneHotEncoder and OrdinalEncoder. In a previous commit (# 9edbbce) these were removed. We should put back the Encoders in the validateDataset step to allow uploading datasets with ordinal and categorical columns.

This functionality was part of Aliro prior to the 9edbbce commit. But, experiments could not be run. It seems the Encoders were never implemented when running experiments. So, this issue will require 2 steps:

  1. Put back the encoders in the validateDataset step
  2. Perform encoding when a new experiment is launched.

Testing of this feature can be done with the datasets in data/datasets/test/integration/

Once this is implemented, we'll need to put back the skipped unittests in learn_tests.py and test_validateDataset.py