bensutherland / simple_pop_stats

A short analysis of population statistics given specific inputs
5 stars 1 forks source link

full_sim.r no longer reporting #collections, rather only what was supposed to be n_per_repunit #8

Closed erondeau closed 4 years ago

erondeau commented 4 years ago

From @bensutherland

"I noticed in the chinook microsat simulations something funky is going on with the '#collections' column. It used to be that this would show the number of collections in each grouping, but now it appears to show the sample size for the repunit? Do you mind taking a look to see what's going on there? If we are going to show sample size for the repunit, that's all good, but let's change the colname to something such as n_per_repunit. I think the number of collections may have been more useful than the n_per_repunit though. "

erondeau commented 4 years ago

This proved trickier than expected, and is not due to a change in the simple_pop_stats repository. Rather, it seems to be due to default change in behavior in dplyr moving from v0.8.X to v1.0.0, Specifically:

library(dplyr)
?count

V0.8.3 manual:

"wt | (Optional) If omitted (and no variable named n exists in the data), will count the number of rows. If specified, will perform a "weighted" tally by summing the (non-missing) values of variable wt. A column named n (but not nn or nnn) will be used as weighting variable by default in tally(), but not in count(). This argument is automatically quoted and later evaluated in the context of the data frame. It supports unquoting. See vignette("programming") for an introduction to these concepts. "

V1.0.0 help:

"wt Frequency weights. Can be a variable (or combination of variables) or NULL. wt is computed once for each unique combination of the counted variables.

If a variable, count() will compute sum(wt) for each unique combination.

If NULL, the default, the computation depends on whether a column of frequency counts n exists in the data frame. If it exists, the counts are computed with sum(n) for each unique combination. Otherwise, n() is used to compute the counts. Supply wt = n() to force this behaviour even if you have an n column in the data frame."

erondeau commented 4 years ago

Should be addressed by: https://github.com/bensutherland/simple_pop_stats/commit/6a1e37b00d5e0074dddc736f234d8a1a548584c7

In addition to fixing #collections column, n_per_repunit is also displayed.

erondeau commented 4 years ago

Note - I chose to add a line that removed column n before doing the count using select(-n). Looks like I could have got the same result by wt=n() within the count, but I was unsure if it would be backwards compatible with v0.8.X (and the first solution worked).