Closed alessiamarcolini closed 4 years ago
least_nan_cols
should return the complementary of the columns returned bymany_nan_columns
(given the same threshold value). Is that right?
Yes. That is correct. But in contrast to least_nan_cols
, many_nan_columns
should be employed only to identify trivial columns that usually should not be considered when developing models. Instead the idea behind least_nan_cols
is to have a method that can return the columns with a fixed ration of NaN. For example it was useful when I needed to have rows with no NaN values (like for UMAP algorithm) and I needed to discard columns with a certain ratio of NaNs to avoid losing too many rows.
Anyway the function is quite simple and for these special situations, the same code could be rewritten in a script, or the function could be moved to UMAP scripts where it is required
fixed by #66
least_nan_cols
method ofDataFrameWithInfo
accepts an externalthreshold
parameter to get the features with a count of NaN values lower than thethreshold
argument.DataFrameWithInfo
has anan_percentage_threshold
(default 0.999) attribute used only inmany_nan_columns
method, which returns name of the columns containing many NaN (over the threshold).Based on my understanding,
least_nan_cols
should return the complementary of the columns returned bymany_nan_columns
(given the same threshold value). Is that right?Is there any reason to use an external parameter for the threshold in
least_nan_cols
method?