Closed rpruim closed 8 years ago
On the other hand - compare Vignette LessVolume-MoreCreativity, section Numerical Summaries: Two Variables
All do the same thing.
mean(age ~ substance,data=HELPrct)
## alcohol cocaine heroin
## 38.19774 34.49342 33.44355
mean( ~ age | substance, data=HELPrct)
## alcohol cocaine heroin
## 38.19774 34.49342 33.44355
And somehow inconsistent (?) to
bargraph( ~ sex | substance, type="proportion", data=HELPrct)
I previously had assumed that
tally(sex ~ substance, data=HELPrct, format="proportion", margins = TRUE)
and
tally( ~ sex | substance, data=HELPrct, format="proportion", margins = TRUE)
yielded the same results.
I would have tried to generate the former behavior by:
tally(~ sex + substance, format="proportion")
Regarding bargraph()
, it might be time to refactor this anyway. It doesn't even use tally()
. Rather it uses xtabs()
and passes things along to barchart()
. I'll have to look and see whether refactoring here makes sense and how easy it is to get barchart()
to do what we want.
What is our desired output of
bargraph( ~ sex | substance, type="proportion", data=HELPrct)
moving bargraph()
part of this to its own issue (#607).
I would think three sets of two proportions (each of which add to 1 within each of the substance groups).
On Jun 28, 2016, at 10:01 PM, Randall Pruim notifications@github.com wrote:
Regarding bargraph(), it might be time to refactor this anyway. It doesn't even use tally(). Rather it uses xtabs() and passes things along to barchart(). I'll have to look and see whether refactoring here makes sense and how easy it is to get barchart() to do what we want.
What is our desired output of
bargraph( ~ sex | substance, type="proportion", data=HELPrct) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Nicholas Horton Professor of Statistics Department of Mathematics and Statistics, Amherst College PO Box 5000, AC #2239 Amherst, MA 01002-5000 https://www.amherst.edu/people/facstaff/nhorton
I've done a bit of work on tally()
.
It now "promotes" formulas (like many of our other functions) so that a ~ b
, ~ a | b
and ~ a, groups = b
all end up being the same thing thing internally at the point where the table is created. This means that a ~ b
is not considered as "tally a conditional on b" and proportions will sum to 1 for each level of b
.
Proportions are computed so they sum to one for each level of the condition.
Margins are added (when requested) for each non-conditional dimension.
For formulas of the form ~ rhs
, no conditioning is done. (This avoids silly tables where all the proportions are 1, and the marginal totals get sillier from there.) In other words, the right hand side is a "condition" only if (a) there is a left hand side, and (b) there is not another condition (coming from |
or groups
).
This seems reasonably consistent and comprehensible.
Examples:
tally( ~ sex + substance, data = HELPrct, margins = TRUE, format = "proportion")
## substance
## sex alcohol cocaine heroin Total
## female 0.0795 0.0905 0.0662 0.2362
## male 0.3113 0.2450 0.2075 0.7638
## Total 0.3907 0.3355 0.2737 1.0000
tally( sex ~ substance, data = HELPrct, margins = TRUE, format = "proportion")
## substance
## sex alcohol cocaine heroin
## female 0.203 0.270 0.242
## male 0.797 0.730 0.758
## Total 1.000 1.000 1.000
tally( sex ~ substance | homeless, data = HELPrct, margins = TRUE, format = "proportion")
## , , homeless = homeless
##
## substance
## sex alcohol cocaine heroin Total
## female 0.0957 0.0670 0.0287 0.1914
## male 0.3971 0.2153 0.1962 0.8086
## Total 0.4928 0.2823 0.2249 1.0000
##
## , , homeless = housed
##
## substance
## sex alcohol cocaine heroin Total
## female 0.0656 0.1107 0.0984 0.2746
## male 0.2377 0.2705 0.2172 0.7254
## Total 0.3033 0.3811 0.3156 1.0000
tally( sex ~ substance + homeless, data = HELPrct, margins = TRUE, format = "proportion")
## , , homeless = homeless
##
## substance
## sex alcohol cocaine heroin
## female 0.194 0.237 0.128
## male 0.806 0.763 0.872
## Total 1.000 1.000 1.000
##
## , , homeless = housed
##
## substance
## sex alcohol cocaine heroin
## female 0.216 0.290 0.312
## male 0.784 0.710 0.688
## Total 1.000 1.000 1.000
The original examples above are now equivalent:
tally(sex ~ substance, data=HELPrct, format="proportion", margins = TRUE)
## substance
## sex alcohol cocaine heroin
## female 0.203 0.270 0.242
## male 0.797 0.730 0.758
## Total 1.000 1.000 1.000
tally( ~ sex | substance, data=HELPrct, format="proportion", margins = TRUE)
## substance
## sex alcohol cocaine heroin
## female 0.203 0.270 0.242
## male 0.797 0.730 0.758
## Total 1.000 1.000 1.000
Also, some documentation has been added. In particular:
Details
The dplyr package also exports a tally function. If x inherits from class "tbl" or "data frame", then dplyr's tally() is called. This makes it easier to have the two packages coexist.
Otherwise, tally() is designed as an alternative to table() and xtabs(). The primary use case it to describe a (possibly multi-dimensional) table using a formula. For a table of counts, each component of the formala becomes one of the dimensions of the cross table. For tables of proportions or percents, conditional proportions and percents are computed, conditioned on each level of all "secondary" (i.e., conditioning) variables, defined as everything other than the left hand side, if there is a left hand side to the formala; and everything except the right hand side if the left hand side of the formula is empty. Note that groups is folded into the formula prior to this determination and becomes part of the conditioning.
When marginal totals are added, they are added for all of the conditioning dimensions, and proportions should sum to 1 for each level of the conditioning variables. This can be useful to make it clear which conditional proportions are beign computed.
Currently the following give different results:
But a user pointed out that this is a change from earlier versions and doesn't match the documentation.