boxuancui / DataExplorer

Automate Data Exploration and Treatment
http://boxuancui.github.io/DataExplorer/
Other
512 stars 88 forks source link

plot_histogram throws exception when no continuous columns are present #155

Closed SoccerGeekPhD closed 3 years ago

SoccerGeekPhD commented 3 years ago

I am using DataExplorer_0.8.2 to analyze data sets from other groups of analysts in my company. Many sets have 100's of columns where the first part of the column name is a prefix for a group of columns. I then pass a set of column names to plot_histogram.

plot_histogram returns an error if there are no continuous columns. Should this be a warning instead of an error? It seems excessive to halt the program about this.

boxuancui commented 3 years ago

plot_histogram has one job: To produce histograms for continuous features. If it fails to do that due to the input data, I believe the owner should be alerted that something is wrong. During exploratory analysis, many things could be overlooked, and it is better to alert than to fail silently, IMO at least.

With that said, I believe most of the use cases shouldn't be impacted by this (correct me if I am wrong?). Here are a few I can think of:

  1. Imagine using plot_histogram() stand-alone in REPL (e.g., plot_histogram(letters)), the program stops with either a warning or an error. There is not much difference to the end user.
  2. create_report will automatically skip if there are no continuous columns, so it won't break.
  3. If you are analyzing each group of columns in a for loop, this might break your loop. However, why not pass all the columns to plot_histogram directly? It is built to handle all columns at the same time.

Finally, if there are use cases I can't foresee, you can always use something like tryCatch(plot_histogram(letters), error = function(e) e, finally = cat("Warning! ")) to let it run like a warning.