boxuancui / DataExplorer

Automate Data Exploration and Treatment
http://boxuancui.github.io/DataExplorer/
Other
514 stars 88 forks source link

Better handling of missing values in GenerateReport #16

Closed djhurio closed 8 years ago

djhurio commented 8 years ago

The example looks very nice. I am trying to run it on my own data, but I am getting the following error:

label: correlation_continuous
Quitting from lines 51-52 (report.rmd) 
Error in seq.default(from = best$lmin, to = best$lmax, by = best$lstep) : 
  'from' must be of length 1
boxuancui commented 8 years ago

Could you run PlotMissing function first and see if certain features are mostly NA? If so, that could be the reason.

To quick fix this, I would remove those features and run GenerateReport again.

I plan to add some missing value scanning before plotting. Please confirm this is the actual cause and I will make use of this issue as the enhancement.

djhurio commented 8 years ago

Yes, I confirm this. Removing variables with NA rate more then 50% removed the first error. But now I have stopped on the next error:

label: correlation_discrete
Quitting from lines 63-64 (report.rmd) 
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
boxuancui commented 8 years ago

It is probably because you have some problematic discrete features too. Could you update the package to the latest develop branch? I have pushed some bug fixes and your issues should be addressed. Please let me know otherwise.

if (!require(devtools)) install.packages("devtools")
library(devtools)
install_github("boxuancui/DataExplorer", ref="develop")
djhurio commented 8 years ago

I have installed development version. I am not getting errors any more. Report is generated with a warning:

Warning message:
In writeLines(if (encoding == "") res else native_encode(res, to = encoding),  :
  invalid char string in output conversion

And report is unreadable.

boxuancui commented 8 years ago

I believe it is due to non-ASCII characters in the data. I have created #19 to address this. For now, it is inherited from default rmarkdown settings.

boxuancui commented 8 years ago

I will close this ticket since it is a bug about missing values.