elbersb / segregation

R package to calculate entropy-based segregation indices, with a focus on the Mutual Information Index (M) and Theil’s Information Index (H)
https://elbersb.com/segregation
Other
35 stars 3 forks source link

test_ipf.R: match groups before test equal #6

Closed mattdowle closed 3 years ago

mattdowle commented 3 years ago

Adjustment to the test to cope if aggregate() returns the groups in a different order. The added match() ensures that the result for each group is compared without an assumption that the new and old result has the groups in the same order.

With the current dev version of data.table 1.13.7 (to be released as 1.13.8), news item 2 is :

fintersect() now retains the order of the first argument as reasonably expected, rather than retaining the order of the second argument, #4716. Thanks to Michel Lang for reporting, and Ben Schwen for the PR.

which affects the fintersect() calls in create_common_data(). The first call affected being https://github.com/elbersb/segregation/blob/master/R/ipf.R#L93. Eventually the test receives new and old containing the correct group results as before, but in a slightly different order, as follows.

> library("segregation")
> schools00_r <- schools00[schools00$school %in% schools05$school, ]
> schools05_r <- schools05[schools05$school %in% schools00$school, ]
> 
> adj <- ipf(schools00_r, schools05_r, "race", "school", weight = "n", precision = 0.1)
>           
> new <- aggregate(adj$n, list(adj$race), sum)
> old <- aggregate(schools05_r$n, list(schools05_r$race), sum)
> 
> new
  Group.1      x
1   asian  17494
2   black 122787
3    hisp 116932
4   white 468978
5  native   5879
> old
  Group.1      x
1   asian  22508
2   black 117527
3    hisp 147052
4  native   6085    # last 2 groups switched position
5   white 441050

This was highlighted by revdep testing of data.table in dev, https://github.com/elbersb/segregation/issues/5. Linking to the revdep status tracking issue, https://github.com/Rdatatable/data.table/issues/4866. Linking to the change to data.table::fintersect(), https://github.com/Rdatatable/data.table/issues/4716.

If it looks ok to you, there's no rush at all but when you have a chance could you merge and publish to CRAN please. Otherwise data.table will break segregation's tests on CRAN on the next update. Thanks! I checked that with this PR, segregation passes R CMD check with both current release of data.table (1.13.6), and the dev version (so you don't need to wait for the new version to be released).

elbersb commented 3 years ago

Thanks for the PR! Bit sloppy, that test, on my part. Will merge right away, and hopefully do a CRAN release soon.

mattdowle commented 3 years ago

Thanks again. Just to give you a heads up that we might release v1.14.0 sooner than anticipated due to https://github.com/Rdatatable/data.table/pull/4894. In recent months we've noticed that CRAN have been giving just 2 weeks notice to packages in error status, otherwise the package is removed from CRAN. When I submit to CRAN I'll just say I've communicated with you, as per CRAN policy, and it won't hold up release. But then you'll probably only have 2 weeks to update otherwise your package will be removed from CRAN. Not my rules or policy, just so you were fully aware. Hope ok.

elbersb commented 3 years ago

0.5.0 is now published on CRAN

mattdowle commented 3 years ago

Perfect. Thank you!