Closed ScaonE closed 3 years ago
Seems like map()
(from what package is it?) is not retaining the class of the object... You may try setting it expliticly, as in this example:
library(dplyr)
library(summarytools)
iris_summary <-
iris %>%
group_by(Species) %>%
dfSummary()
class(iris_summary) # stby
iris_summary <- lapply(iris_summary, function(x) x[-5,])
class(iris_summary) # list
class(iris_summary) <- "stby"
view(iris_summary)
Grouping variables are now excluded by default from dfSummaries. They can be retained using keep.grp.vars = TRUE
, either in dfSummary()
or in print/view, as the masking occurs in the printing phase.
It can be tested by installing the dev-current branch.
Thanks for your input
class(iris_summary) <- "stby"
This did the trick for me. I didn't spend much time trying to use the dev-current version as I encountered an installation error (this is on my end)
I allow myself to ask another question within this thread : Is it possible to display Q1 and Q3 instead of IQR (CV) in the dfSummary output ?
Glad it worked out.
For the dfSummary stats, the problem is that everyone has their preferences... I might at some point include an optional "additional row" of stats, which could include Q1/Q3. But feel free to fork the package and modify it to suit your needs if you feel like experimenting a bit :)
Curious to know, the map()
function you used, is it part of the purrr package?
One last question if I may : Can you point me to a description / explanation about IQR (CV) "vs" Q1 & Q3 ? (I am so used to being asked to report Q1 & Q3, but rarely IQR) (thus a good explanation might help me convince people about the usefulness of IQR)
Well, I wouldn't say one set is clearly "superior" to the other... It is true that knowing Q1 and Q3, we're only one quick operation away from knowing the IQR. On the other hand, having both the IQR and the CV in the summary seems like a good compromise, given the space available, as it gives you a robust measure of dispersion (IQR), plus a "standardized" (relative is more precise a term though) value for dispersion (CV) that allows you to compare variability across variables that are on totally different scales.
Already knowing the min, median and max (from the summary table), the IQR provides a faster way to picture mentally the kurtosis, while having Q1 and Q3 would shift the focus towards the skewness. And this is arguably the downside of IQR as opposed to Q1 & Q3, that it doesn't give you information about skewness. However, the histograms are there to help us with that.
The visual aspect is always on my mind when deciding what to include; the amount of space it takes, but also the way it integrates with the rest; for instance, including a fivenum would make sense, theoretically speaking, but it would look quite odd.
Fixed & added optional additional stats to show -- example for Q1 & Q3 provided here: https://raw.githubusercontent.com/dcomtois/summarytools/master/doc/Custom-Statistics-in-dfSummary.pdf
Dear all,
I find columns used for grouping pretty uninformative in the dfSummary output, given that they will always contain a single value with 100% freq
I didn't find the option which would allow me to remove them, so I tried myself using :
So far it seems good, I still have a summarytools output, without the grouping variable in each group
But this fails, any tips / suggestions ?