alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
760 stars 86 forks source link

Add `DropNaNRowsTransformer` #2705

Open angela97lin opened 3 years ago

angela97lin commented 3 years ago

https://github.com/alteryx/evalml/pull/2692 introduced a generic DropRowsTransformer but based on the thread here, it'd be a good idea to introduce a data-agnostic component which detects and drops nan rows.

This could probably be done by subclassing DropRowsTransformer and adding nan detection logic!

angela97lin commented 2 years ago

One pro of this approach is that we don't need to store the indices to remove in the output of the data check, since we need to recalculate here. This means that it is incredibly important that the two methods of detecting nan rows are at parity at all times.