Closed erondeau closed 4 years ago
This proved trickier than expected, and is not due to a change in the simple_pop_stats repository. Rather, it seems to be due to default change in behavior in dplyr moving from v0.8.X to v1.0.0, Specifically:
library(dplyr)
?count
V0.8.3 manual:
"wt | (Optional) If omitted (and no variable named n exists in the data), will count the number of rows. If specified, will perform a "weighted" tally by summing the (non-missing) values of variable wt. A column named n (but not nn or nnn) will be used as weighting variable by default in tally(), but not in count(). This argument is automatically quoted and later evaluated in the context of the data frame. It supports unquoting. See vignette("programming") for an introduction to these concepts. "
V1.0.0 help:
"wt
If a variable, count() will compute sum(wt) for each unique combination.
If NULL, the default, the computation depends on whether a column of frequency counts n exists in the data frame. If it exists, the counts are computed with sum(n) for each unique combination. Otherwise, n() is used to compute the counts. Supply wt = n() to force this behaviour even if you have an n column in the data frame."
Should be addressed by: https://github.com/bensutherland/simple_pop_stats/commit/6a1e37b00d5e0074dddc736f234d8a1a548584c7
In addition to fixing #collections
column, n_per_repunit
is also displayed.
Note - I chose to add a line that removed column n
before doing the count using select(-n)
. Looks like I could have got the same result by wt=n()
within the count, but I was unsure if it would be backwards compatible with v0.8.X (and the first solution worked).
From @bensutherland
"I noticed in the chinook microsat simulations something funky is going on with the '#collections' column. It used to be that this would show the number of collections in each grouping, but now it appears to show the sample size for the repunit? Do you mind taking a look to see what's going on there? If we are going to show sample size for the repunit, that's all good, but let's change the colname to something such as n_per_repunit. I think the number of collections may have been more useful than the n_per_repunit though. "