According to the documentation, pool.scalar() will assume an infinite sample (n = Inf) by default. But that doesn’t match the actual behaviour, which results in a degrees of freedom of NaN. Example:
library(mice)
pool.scalar(13:17, 3:7)$df
#> [1] NaN
The expected result would be approx. the df one gets when one uses a very large n, e.g.:
pool.scalar(13:17, 3:7, n = 10^6)$df
#> [1] 28.44315
The bug is caused by the barnard.rubin() function (which pool.scalar() uses internally):
When dfcom = Inf, (dfcom + 1) / (dfcom + 3) in the dfobs <- line equals Inf/Inf, which is NaN (not 1), and it is still NaN when multiplied by dfcom * (1 - lambda). It should instead be Inf.
Since the factor dfobs / (dfold + dfobs) in the last line is 1 whenver dfobs is Inf, the correct behaviour would be to just output dfold whenever dfcom is Inf (and perhaps the default value dfcom = 999999 should be changed to dfcom = Inf). For the above example, the resulting value is (exactly) 28.44444…, which is in line with what you get with the large value n = 10^6 (28.44315).
Summary:
Whenever dfcom = Inf, barnard.rubin() should output dfold instead of dfold * dfobs / (dfold + dfobs).
The default and arbitrary value of dfcom = 999999 should be changed to dfcom = Inf.
According to the documentation,
pool.scalar()
will assume an infinite sample (n = Inf
) by default. But that doesn’t match the actual behaviour, which results in a degrees of freedom ofNaN
. Example:The expected result would be approx. the
df
one gets when one uses a very largen
, e.g.:The bug is caused by the
barnard.rubin()
function (whichpool.scalar()
uses internally):When
dfcom = Inf
,(dfcom + 1) / (dfcom + 3)
in thedfobs <-
line equalsInf/Inf
, which isNaN
(not1
), and it is stillNaN
when multiplied bydfcom * (1 - lambda)
. It should instead beInf
.Since the factor
dfobs / (dfold + dfobs)
in the last line is 1 whenverdfobs
isInf
, the correct behaviour would be to just outputdfold
wheneverdfcom
isInf
(and perhaps the default valuedfcom = 999999
should be changed todfcom = Inf
). For the above example, the resulting value is (exactly)28.44444…
, which is in line with what you get with the large valuen = 10^6
(28.44315
).Summary:
dfcom = Inf
,barnard.rubin()
should outputdfold
instead ofdfold * dfobs / (dfold + dfobs)
.dfcom = 999999
should be changed todfcom = Inf
.