business-science / correlationfunnel

Speed Up Exploratory Data Analysis (EDA)
https://business-science.github.io/correlationfunnel/
Other
132 stars 25 forks source link

Is it valid to use a continuous target variable? #8

Open bokov opened 1 year ago

bokov commented 1 year ago

Let's say I have a continuous response and I'm interested in correlations all along its range, not just the top or bottom quartile.

Instead of running a separate correlation_funnel for each bin, is there any reason I cannot add the continuous column back in between running binarize() and correlate()?

Reprex:

library(dplyr); library(correlationfunnel);

foo <- select(survival::veteran,-'time') %>% binarize() %>% 
    cbind(time=survival::veteran$time) %>% correlate(target=time);

foo$bin[1] <- 'time';

plot_correlation_funnel(foo);

The above runs with one warning and no errors, producing a plot where presumably all correlations are relative to an outcome variable that is not binned just like I want.

My question: Is it valid to use correlationfunel this way?

Thanks.