daya6489 / SmartEDA

a R package for data exploratory analysis
https://daya6489.github.io/SmartEDA/
Other
42 stars 14 forks source link

Inconsistent treatment of percentages in ExpData #2

Closed jpiversen closed 4 years ago

jpiversen commented 5 years ago

Hi,

I love your package, but I was surprised to find that ExpReport reported "% of Missing" as 0 when I knew that there was a small amount of NAs in the variable.

I checked, and it seems like the issue is with ExpData (with type=2). On line 67 of fn_Overview_data.R "% of Missing" is calculated as:

Per_missing <- round(length(Xvar[is.na(Xvar)]) / length(Xvar), 2)

Here, percentages are calculated as decimals and rounded (2 digits) without being multiplied by 100. So for example 99% will be reported as 0.99, and 0.5% will be reported as 0%.

A few lines above, in the same R-file, the percentages for ExpData with type=1 are calculated as:

p4 <- paste0(round(length(dd[dd == 1]) / length(dd) * 100, 2), "%", " (", length(dd[dd == 1]), ")")

So, since the decimals are multiplied by 100, 99% will be reported as 99%, and 0.5% as 0.5%.

Both names are similar (e.g. "%. of variables having complete cases", and "% of Missing"), so maybe it would be a good idea to calculate/report the percentages in a similar fashion?

This could easily be fixed with: Per_missing <- round(length(Xvar[is.na(Xvar)]) / length(Xvar) * 100, 2)

daya6489 commented 5 years ago

Thank you @jpiversen for identifying this error. I will update this change in the upcoming version of SmartEDA package.