business-science / correlationfunnel

Speed Up Exploratory Data Analysis (EDA)
https://business-science.github.io/correlationfunnel/
Other
132 stars 25 forks source link

Option for NAs #1

Open ddalevi opened 5 years ago

ddalevi commented 5 years ago

Would be nice with something similar as you have for the standard cor-function where users can specify how they would like to deal with NA’s. I do agree with the comment "Missing values and cleaning data are critical to getting great correlations" but a function like this is very convinient when having a few NAs in some columns.

mdancho84 commented 5 years ago

Let me look into this. I have had several students contact me about this, and I will consider.

GitHunter0 commented 3 years ago

For sure, in binarize(), just having an option to convert NAs into a separate bin would minimize this issue and is very easy to implement.

Otherwise, in the case of a numeric variable for example, you would have to drop the missing observations or convert numeric NAs into an arbitrary value (e.g. zero), which would both distort (statistical bias) the analysis in unpredictable ways.

Having the option to break numeric variables in bins by a specified criterion (frequency or interval length for e.g.) instead of number of bins would be useful too.

Despite that, overall another great package, so thanks @mdancho84