dcomtois / summarytools

R Package to Quickly and Neatly Summarize Data
502 stars 77 forks source link

Getting incorrect results when using stby and descr with weights #196

Open CaraghS opened 3 months ago

CaraghS commented 3 months ago

I am using stby in summary tools to calculated weighted descriptive statistics by group. However, when I do this I am getting a different answer compared to when I filter by grouping variable and then apply the descr function in summary tools. See below - mydf = my unfiltered dataframe, score is a 0-10 variable that I want to get the mean of.

when I filter first and split my df

filtered_male <- mydf$gender %>% filter(gender==1) with(filtered_male, stby(score, gender, descr, weights = weight)) Weighted Descriptive Statistics
score by gender
Data Frame: filtered_male
Weights: weight
N: 838

                       1

       Mean         6.86
    Std.Dev         2.93
        Min         0.00
     Median         8.00
        Max        10.00
        MAD         2.97
         CV         0.43
    N.Valid   1509584.07
  Pct.Valid        99.70

when I don't split my df

with(mydf, stby(score, gender, descr, weights = weight, simplify = TRUE)) Weighted Descriptive Statistics
score by gender
Data Frame: mydf Weights: weight
N: 838

                       1            2

       Mean         7.01         6.79
    Std.Dev         2.81         3.02
        Min         0.00         0.00
     Median         8.00         8.00
        Max        10.00        10.00
        MAD         2.97         2.97
         CV         0.40         0.45
    N.Valid   1715494.12   1379339.65
  Pct.Valid        56.05        45.07

'''

Any idea's on why this is happening or how I fix it to get the correct weighted mean? (I've check the answers manually and the mean where I filter first is correct). Also, this doesn't seem to be an issue when I don't use weights.

CaraghS commented 3 months ago

Can I also add - the weighted median reported appears to be incorrect - it is different to that calculated using other R packages.