Closed EBukin closed 1 year ago
Just for the record, grouped data frame do not yet work for report_sample()
, the syntax would be:
palmerpenguins::penguins %>%
select(species, body_mass_g, ends_with("_mm")) %>%
drop_na() %>%
report_sample(group_by = "species")
@DominiqueMakowski I committed a hotfix, which should fix the bug, however, the issue is a bit more complicated.
At some point, report()
calls report_table.numeric()
, which removes either the column n_missing
or percentage_missing
:
missing_percentage
is determined here:
That function, in turn, checks the length of a vector or data frame, and if > 100, returns TRUE
:
Not sure why this is done? In the above example, the different species
have different length, so for one group, column n_Missing
only is removed, for the other percentage_Missing
only is removed, although both have no missings. This results in different column names, and thus, rbind()
fails:
I just don't know why you define missing_percentage
this way?
weird lemme take a look
Okay so the n >= 100 was a pure heuristic decision for the default, in which it will display the n
for small-ish numbers, but the percentage for bigger dataframes (if n > 100), but indeed it should be checked accrosss the whole, ungrouped dataframe, and not on the subgroups 😓
do you have in mind a smarter way of dealing with that? Essentially currently report.data.frame
has levels_percentage = "auto"
and missing_percentage = "auto"
, and the auto rule was to put TRUE for bigger numbers and FALSE for smaller numbers
It doens't fiail if missing_percentage is specified. Maybe we should set it to just TRUE / FALSE by default and remove the "auto"? Though in principle it was nice because it's true that for smaller numbers ('til 100) it's easier to represent with numbers, but for more than that we switch to percentage which brings it back to a 0-100 scale in a way
library(palmerpenguins)
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.5
library(report)
palmerpenguins::penguins %>%
select(species, body_mass_g, ends_with("_mm")) %>%
drop_na() %>%
group_by(species) %>%
report(missing_percentage = FALSE)
#> The data contains 342 observations, grouped by species, of the following 5 variables:
#>
#> - Adelie (n = 151):
#> - body_mass_g: n = 151, Mean = 3700.66, SD = 458.57, Median = 3700.00, MAD = 444.78, range: [2850, 4775], Skewness = 0.29, Kurtosis = -0.57, 0 missing
#> - bill_length_mm: n = 151, Mean = 38.79, SD = 2.66, Median = 38.80, MAD = 2.97, range: [32.10, 46], Skewness = 0.16, Kurtosis = -0.16, 0 missing
#> - bill_depth_mm: n = 151, Mean = 18.35, SD = 1.22, Median = 18.40, MAD = 1.19, range: [15.50, 21.50], Skewness = 0.32, Kurtosis = -0.06, 0 missing
#> - flipper_length_mm: n = 151, Mean = 189.95, SD = 6.54, Median = 190.00, MAD = 7.41, range: [172, 210], Skewness = 0.09, Kurtosis = 0.33, 0 missing
#>
#> - Chinstrap (n = 68):
#> - body_mass_g: n = 68, Mean = 3733.09, SD = 384.34, Median = 3700.00, MAD = 370.65, range: [2700, 4800], Skewness = 0.25, Kurtosis = 0.59, 0 missing
#> - bill_length_mm: n = 68, Mean = 48.83, SD = 3.34, Median = 49.55, MAD = 3.63, range: [40.90, 58], Skewness = -0.09, Kurtosis = 0.04, 0 missing
#> - bill_depth_mm: n = 68, Mean = 18.42, SD = 1.14, Median = 18.45, MAD = 1.41, range: [16.40, 20.80], Skewness = 6.88e-03, Kurtosis = -0.87, 0 missing
#> - flipper_length_mm: n = 68, Mean = 195.82, SD = 7.13, Median = 196.00, MAD = 7.41, range: [178, 212], Skewness = -9.47e-03, Kurtosis = 0.05, 0 missing
#>
#> - Gentoo (n = 123):
#> - body_mass_g: n = 123, Mean = 5076.02, SD = 504.12, Median = 5000.00, MAD = 555.98, range: [3950, 6300], Skewness = 0.07, Kurtosis = -0.72, 0 missing
#> - bill_length_mm: n = 123, Mean = 47.50, SD = 3.08, Median = 47.30, MAD = 3.11, range: [40.90, 59.60], Skewness = 0.65, Kurtosis = 1.30, 0 missing
#> - bill_depth_mm: n = 123, Mean = 14.98, SD = 0.98, Median = 15.00, MAD = 1.19, range: [13.10, 17.30], Skewness = 0.32, Kurtosis = -0.58, 0 missing
#> - flipper_length_mm: n = 123, Mean = 217.19, SD = 6.48, Median = 216.00, MAD = 5.93, range: [203, 231], Skewness = 0.39, Kurtosis = -0.58, 0 missing
Created on 2021-04-23 by the reprex package (v1.0.0)
Somehow, one group is omitted with report_table
after the hot fix 4ce4f01...
library(tidyverse)
library(report)
palmerpenguins::penguins %>%
select(species, body_mass_g, ends_with("_mm")) %>%
group_by(species) %>%
report_table()
#> Group | Variable | n_Obs | Mean | SD | Median | MAD | Min | Max | Skewness | Kurtosis | percentage_Missing | n_Missing
#> -------------------------------------------------------------------------------------------------------------------------------------------------------
#> Chinstrap | body_mass_g | 68 | 3733.09 | 384.34 | 3700.00 | 370.65 | 2700.00 | 4800.00 | 0.25 | 0.59 | | 0
#> Chinstrap | bill_length_mm | 68 | 48.83 | 3.34 | 49.55 | 3.63 | 40.90 | 58.00 | -0.09 | 0.04 | | 0
#> Chinstrap | bill_depth_mm | 68 | 18.42 | 1.14 | 18.45 | 1.41 | 16.40 | 20.80 | 6.88e-03 | -0.87 | | 0
#> Chinstrap | flipper_length_mm | 68 | 195.82 | 7.13 | 196.00 | 7.41 | 178.00 | 212.00 | -9.47e-03 | 0.05 | | 0
#> Gentoo | body_mass_g | 124 | 5076.02 | 504.12 | | 555.98 | 3950.00 | 6300.00 | 0.07 | -0.72 | 0.81 |
#> Gentoo | bill_length_mm | 124 | 47.50 | 3.08 | | 3.11 | 40.90 | 59.60 | 0.65 | 1.30 | 0.81 |
#> Gentoo | bill_depth_mm | 124 | 14.98 | 0.98 | | 1.19 | 13.10 | 17.30 | 0.32 | -0.58 | 0.81 |
#> Gentoo | flipper_length_mm | 124 | 217.19 | 6.48 | | 5.93 | 203.00 | 231.00 | 0.39 | -0.58 | 0.81 |
palmerpenguins::penguins %>%
select(species, body_mass_g, ends_with("_mm")) %>%
group_by(species) %>%
report_table(missing_percentage = FALSE)
#> Group | Variable | n_Obs | Mean | SD | Median | MAD | Min | Max | Skewness | Kurtosis | n_Missing
#> ----------------------------------------------------------------------------------------------------------------------------------
#> Chinstrap | body_mass_g | 68 | 3733.09 | 384.34 | 3700.00 | 370.65 | 2700.00 | 4800.00 | 0.25 | 0.59 | 0
#> Chinstrap | bill_length_mm | 68 | 48.83 | 3.34 | 49.55 | 3.63 | 40.90 | 58.00 | -0.09 | 0.04 | 0
#> Chinstrap | bill_depth_mm | 68 | 18.42 | 1.14 | 18.45 | 1.41 | 16.40 | 20.80 | 6.88e-03 | -0.87 | 0
#> Chinstrap | flipper_length_mm | 68 | 195.82 | 7.13 | 196.00 | 7.41 | 178.00 | 212.00 | -9.47e-03 | 0.05 | 0
#> Gentoo | body_mass_g | 124 | 5076.02 | 504.12 | | 555.98 | 3950.00 | 6300.00 | 0.07 | -0.72 | 1
#> Gentoo | bill_length_mm | 124 | 47.50 | 3.08 | | 3.11 | 40.90 | 59.60 | 0.65 | 1.30 | 1
#> Gentoo | bill_depth_mm | 124 | 14.98 | 0.98 | | 1.19 | 13.10 | 17.30 | 0.32 | -0.58 | 1
#> Gentoo | flipper_length_mm | 124 | 217.19 | 6.48 | | 5.93 | 203.00 | 231.00 | 0.39 | -0.58 | 1
Created on 2021-04-23 by the reprex package (v2.0.0)
Somehow, one group is omitted with
report_table
Hi - I'm running into this issue as well, maybe duplicated in #246 ?
System: Analyses were conducted using the R Statistical language (version 4.1.2; R Core Team, 2021) on Pop!_OS 22.04 LTS
Packages:
When I add print(table)
inside the for loop of report_table.grouped_df
of report.data.frame
line 390 and provide iris %>% group_by(Species)
I get:
[1] Variable Mean SD Min Max n_Missing Group
<0 rows> (or 0-length row.names)
Variable Mean SD Min Max n_Missing Group
1 Sepal.Length 5.936 0.5161711 4.9 7.0 0 versicolor
2 Sepal.Width 2.770 0.3137983 2.0 3.4 0 versicolor
3 Petal.Length 4.260 0.4699110 3.0 5.1 0 versicolor
4 Petal.Width 1.326 0.1977527 1.0 1.8 0 versicolor
Variable Mean SD Min Max n_Missing Group
1 Sepal.Length 5.936 0.5161711 4.9 7.0 0 versicolor
2 Sepal.Width 2.770 0.3137983 2.0 3.4 0 versicolor
3 Petal.Length 4.260 0.4699110 3.0 5.1 0 versicolor
4 Petal.Width 1.326 0.1977527 1.0 1.8 0 versicolor
5 Sepal.Length 6.588 0.6358796 4.9 7.9 0 virginica
6 Sepal.Width 2.974 0.3224966 2.2 3.8 0 virginica
7 Petal.Length 5.552 0.5518947 4.5 6.9 0 virginica
8 Petal.Width 2.026 0.2746501 1.4 2.5 0 virginica
Showing a zero-length table on the first iteration, and loosing that group from the output
Group | Variable | n_Obs | Mean | SD | Median | MAD | Min | Max | Skewness | Kurtosis | n_Missing
---------------------------------------------------------------------------------------------------------------
versicolor | Sepal.Length | 50 | 5.94 | 0.52 | 5.90 | 0.52 | 4.90 | 7.00 | 0.11 | -0.53 | 0
versicolor | Sepal.Width | 50 | 2.77 | 0.31 | 2.80 | 0.30 | 2.00 | 3.40 | -0.36 | -0.37 | 0
versicolor | Petal.Length | 50 | 4.26 | 0.47 | 4.35 | 0.52 | 3.00 | 5.10 | -0.61 | 0.05 | 0
versicolor | Petal.Width | 50 | 1.33 | 0.20 | 1.30 | 0.22 | 1.00 | 1.80 | -0.03 | -0.41 | 0
virginica | Sepal.Length | 50 | 6.59 | 0.64 | 6.50 | 0.59 | 4.90 | 7.90 | 0.12 | 0.03 | 0
virginica | Sepal.Width | 50 | 2.97 | 0.32 | 3.00 | 0.30 | 2.20 | 3.80 | 0.37 | 0.71 | 0
virginica | Petal.Length | 50 | 5.55 | 0.55 | 5.55 | 0.67 | 4.50 | 6.90 | 0.55 | -0.15 | 0
virginica | Petal.Width | 50 | 2.03 | 0.27 | 2.00 | 0.30 | 1.40 | 2.50 | -0.13 | -0.60 | 0
If I instead run print(current_table)
, I find three tables as expected
Variable | Mean | SD | Min | Max | n_Missing | Group
-------------------------------------------------------------
Sepal.Length | 5.01 | 0.35 | 4.30 | 5.80 | 0 | setosa
Sepal.Width | 3.43 | 0.38 | 2.30 | 4.40 | 0 | setosa
Petal.Length | 1.46 | 0.17 | 1.00 | 1.90 | 0 | setosa
Petal.Width | 0.25 | 0.11 | 0.10 | 0.60 | 0 | setosa
Variable | Mean | SD | Min | Max | n_Missing | Group
-----------------------------------------------------------------
Sepal.Length | 5.94 | 0.52 | 4.90 | 7.00 | 0 | versicolor
Sepal.Width | 2.77 | 0.31 | 2.00 | 3.40 | 0 | versicolor
Petal.Length | 4.26 | 0.47 | 3.00 | 5.10 | 0 | versicolor
Petal.Width | 1.33 | 0.20 | 1.00 | 1.80 | 0 | versicolor
Variable | Mean | SD | Min | Max | n_Missing | Group
----------------------------------------------------------------
Sepal.Length | 6.59 | 0.64 | 4.90 | 7.90 | 0 | virginica
Sepal.Width | 2.97 | 0.32 | 2.20 | 3.80 | 0 | virginica
Petal.Length | 5.55 | 0.55 | 4.50 | 6.90 | 0 | virginica
Petal.Width | 2.03 | 0.27 | 1.40 | 2.50 | 0 | virginica
but of course, the same output.
Similar problem for table_full
[1] Variable n_Obs Mean SD Median MAD Min Max Skewness Kurtosis
[11] n_Missing Group
<0 rows> (or 0-length row.names)
Variable n_Obs Mean SD Median MAD Min Max Skewness Kurtosis n_Missing Group
1 Sepal.Length 50 5.936 0.5161711 5.90 0.51891 4.9 7.0 0.1053776 -0.5330095 0 versicolor
2 Sepal.Width 50 2.770 0.3137983 2.80 0.29652 2.0 3.4 -0.3628448 -0.3662374 0 versicolor
3 Petal.Length 50 4.260 0.4699110 4.35 0.51891 3.0 5.1 -0.6065077 0.0479033 0 versicolor
4 Petal.Width 50 1.326 0.1977527 1.30 0.22239 1.0 1.8 -0.0311796 -0.4100592 0 versicolor
Variable n_Obs Mean SD Median MAD Min Max Skewness Kurtosis n_Missing Group
1 Sepal.Length 50 5.936 0.5161711 5.90 0.51891 4.9 7.0 0.1053776 -0.53300954 0 versicolor
2 Sepal.Width 50 2.770 0.3137983 2.80 0.29652 2.0 3.4 -0.3628448 -0.36623736 0 versicolor
3 Petal.Length 50 4.260 0.4699110 4.35 0.51891 3.0 5.1 -0.6065077 0.04790330 0 versicolor
4 Petal.Width 50 1.326 0.1977527 1.30 0.22239 1.0 1.8 -0.0311796 -0.41005924 0 versicolor
5 Sepal.Length 50 6.588 0.6358796 6.50 0.59304 4.9 7.9 0.1180151 0.03290442 0 virginica
6 Sepal.Width 50 2.974 0.3224966 3.00 0.29652 2.2 3.8 0.3659491 0.70607051 0 virginica
7 Petal.Length 50 5.552 0.5518947 5.55 0.66717 4.5 6.9 0.5494446 -0.15377856 0 virginica
8 Petal.Width 50 2.026 0.2746501 2.00 0.29652 1.4 2.5 -0.1294769 -0.60226448 0 virginica
>
Two very crude modifications get the expected result. I'm not sure if there is a better way to handle merge
in this case.
I wonder if merge does not include new rows but adopts new columns when a non-empty data frame merges with an empty data frame?
if(!length(table_full) == 0){
table_full <- merge(table_full, current_table_full, all = TRUE, sort = FALSE)
}else{
table_full <- current_table_full
}
if(!length(table) == 0){
table <- merge(table, current_table, all = TRUE, sort = FALSE)
}else{
table <- current_table
}
Produces table_full:
Variable | n_Obs | Mean | SD | Median | MAD | Min | Max | Skewness | Kurtosis | n_Missing | Group
-----------------------------------------------------------------------------------------------------------
Sepal.Length | 50 | 5.01 | 0.35 | 5.00 | 0.30 | 4.30 | 5.80 | 0.12 | -0.25 | 0 | setosa
Sepal.Width | 50 | 3.43 | 0.38 | 3.40 | 0.37 | 2.30 | 4.40 | 0.04 | 0.95 | 0 | setosa
Petal.Length | 50 | 1.46 | 0.17 | 1.50 | 0.15 | 1.00 | 1.90 | 0.11 | 1.02 | 0 | setosa
Petal.Width | 50 | 0.25 | 0.11 | 0.20 | 0.00 | 0.10 | 0.60 | 1.25 | 1.72 | 0 | setosa
Variable n_Obs Mean SD Median MAD Min Max Skewness Kurtosis n_Missing Group
1 Sepal.Length 50 5.006 0.3524897 5.00 0.29652 4.3 5.8 0.12008699 -0.2526888 0 setosa
2 Sepal.Width 50 3.428 0.3790644 3.40 0.37065 2.3 4.4 0.04116652 0.9547033 0 setosa
3 Petal.Length 50 1.462 0.1736640 1.50 0.14826 1.0 1.9 0.10639390 1.0215761 0 setosa
4 Petal.Width 50 0.246 0.1053856 0.20 0.00000 0.1 0.6 1.25386137 1.7191302 0 setosa
5 Sepal.Length 50 5.936 0.5161711 5.90 0.51891 4.9 7.0 0.10537762 -0.5330095 0 versicolor
6 Sepal.Width 50 2.770 0.3137983 2.80 0.29652 2.0 3.4 -0.36284484 -0.3662374 0 versicolor
7 Petal.Length 50 4.260 0.4699110 4.35 0.51891 3.0 5.1 -0.60650769 0.0479033 0 versicolor
8 Petal.Width 50 1.326 0.1977527 1.30 0.22239 1.0 1.8 -0.03117960 -0.4100592 0 versicolor
Variable n_Obs Mean SD Median MAD Min Max Skewness Kurtosis n_Missing Group
1 Sepal.Length 50 5.006 0.3524897 5.00 0.29652 4.3 5.8 0.12008699 -0.25268880 0 setosa
2 Sepal.Width 50 3.428 0.3790644 3.40 0.37065 2.3 4.4 0.04116652 0.95470326 0 setosa
3 Petal.Length 50 1.462 0.1736640 1.50 0.14826 1.0 1.9 0.10639390 1.02157611 0 setosa
4 Petal.Width 50 0.246 0.1053856 0.20 0.00000 0.1 0.6 1.25386137 1.71913025 0 setosa
5 Sepal.Length 50 5.936 0.5161711 5.90 0.51891 4.9 7.0 0.10537762 -0.53300954 0 versicolor
6 Sepal.Width 50 2.770 0.3137983 2.80 0.29652 2.0 3.4 -0.36284484 -0.36623736 0 versicolor
7 Petal.Length 50 4.260 0.4699110 4.35 0.51891 3.0 5.1 -0.60650769 0.04790330 0 versicolor
8 Petal.Width 50 1.326 0.1977527 1.30 0.22239 1.0 1.8 -0.03117960 -0.41005924 0 versicolor
9 Sepal.Length 50 6.588 0.6358796 6.50 0.59304 4.9 7.9 0.11801512 0.03290442 0 virginica
10 Sepal.Width 50 2.974 0.3224966 3.00 0.29652 2.2 3.8 0.36594907 0.70607051 0 virginica
11 Petal.Length 50 5.552 0.5518947 5.55 0.66717 4.5 6.9 0.54944459 -0.15377856 0 virginica
12 Petal.Width 50 2.026 0.2746501 2.00 0.29652 1.4 2.5 -0.12947693 -0.60226448 0 virginica
and
Variable | Mean | SD | Min | Max | n_Missing | Group
-------------------------------------------------------------
Sepal.Length | 5.01 | 0.35 | 4.30 | 5.80 | 0 | setosa
Sepal.Width | 3.43 | 0.38 | 2.30 | 4.40 | 0 | setosa
Petal.Length | 1.46 | 0.17 | 1.00 | 1.90 | 0 | setosa
Petal.Width | 0.25 | 0.11 | 0.10 | 0.60 | 0 | setosa
Variable Mean SD Min Max n_Missing Group
1 Sepal.Length 5.006 0.3524897 4.3 5.8 0 setosa
2 Sepal.Width 3.428 0.3790644 2.3 4.4 0 setosa
3 Petal.Length 1.462 0.1736640 1.0 1.9 0 setosa
4 Petal.Width 0.246 0.1053856 0.1 0.6 0 setosa
5 Sepal.Length 5.936 0.5161711 4.9 7.0 0 versicolor
6 Sepal.Width 2.770 0.3137983 2.0 3.4 0 versicolor
7 Petal.Length 4.260 0.4699110 3.0 5.1 0 versicolor
8 Petal.Width 1.326 0.1977527 1.0 1.8 0 versicolor
Variable Mean SD Min Max n_Missing Group
1 Sepal.Length 5.006 0.3524897 4.3 5.8 0 setosa
2 Sepal.Width 3.428 0.3790644 2.3 4.4 0 setosa
3 Petal.Length 1.462 0.1736640 1.0 1.9 0 setosa
4 Petal.Width 0.246 0.1053856 0.1 0.6 0 setosa
5 Sepal.Length 5.936 0.5161711 4.9 7.0 0 versicolor
6 Sepal.Width 2.770 0.3137983 2.0 3.4 0 versicolor
7 Petal.Length 4.260 0.4699110 3.0 5.1 0 versicolor
8 Petal.Width 1.326 0.1977527 1.0 1.8 0 versicolor
9 Sepal.Length 6.588 0.6358796 4.9 7.9 0 virginica
10 Sepal.Width 2.974 0.3224966 2.2 3.8 0 virginica
11 Petal.Length 5.552 0.5518947 4.5 6.9 0 virginica
12 Petal.Width 2.026 0.2746501 1.4 2.5 0 virginica
With resulting output as expected
Group | Variable | n_Obs | Mean | SD | Median | MAD | Min | Max | Skewness | Kurtosis | n_Missing
---------------------------------------------------------------------------------------------------------------
setosa | Sepal.Length | 50 | 5.01 | 0.35 | 5.00 | 0.30 | 4.30 | 5.80 | 0.12 | -0.25 | 0
setosa | Sepal.Width | 50 | 3.43 | 0.38 | 3.40 | 0.37 | 2.30 | 4.40 | 0.04 | 0.95 | 0
setosa | Petal.Length | 50 | 1.46 | 0.17 | 1.50 | 0.15 | 1.00 | 1.90 | 0.11 | 1.02 | 0
setosa | Petal.Width | 50 | 0.25 | 0.11 | 0.20 | 0.00 | 0.10 | 0.60 | 1.25 | 1.72 | 0
versicolor | Sepal.Length | 50 | 5.94 | 0.52 | 5.90 | 0.52 | 4.90 | 7.00 | 0.11 | -0.53 | 0
versicolor | Sepal.Width | 50 | 2.77 | 0.31 | 2.80 | 0.30 | 2.00 | 3.40 | -0.36 | -0.37 | 0
versicolor | Petal.Length | 50 | 4.26 | 0.47 | 4.35 | 0.52 | 3.00 | 5.10 | -0.61 | 0.05 | 0
versicolor | Petal.Width | 50 | 1.33 | 0.20 | 1.30 | 0.22 | 1.00 | 1.80 | -0.03 | -0.41 | 0
virginica | Sepal.Length | 50 | 6.59 | 0.64 | 6.50 | 0.59 | 4.90 | 7.90 | 0.12 | 0.03 | 0
virginica | Sepal.Width | 50 | 2.97 | 0.32 | 3.00 | 0.30 | 2.20 | 3.80 | 0.37 | 0.71 | 0
virginica | Petal.Length | 50 | 5.55 | 0.55 | 5.55 | 0.67 | 4.50 | 6.90 | 0.55 | -0.15 | 0
virginica | Petal.Width | 50 | 2.03 | 0.27 | 2.00 | 0.30 | 1.40 | 2.50 | -0.13 | -0.60 | 0
Hello! Thanks for the great package for reporting everything! I have faced a minor issue recently and hope that this summary will help to fix it.
group_by()
summary statistics fails with simple penguins dataTo Reproduce:
results into:
The same happens with:
Expected behavior as in the examples:
interestingly, similar calls are working with
report_sample
andreport_text
Specifications