Open mkiang opened 8 years ago
hi Mathew, I'll look into it. Can you please join the output of sessionInfo() ? thanks Stefano
Sent from my StrawBerry®
On 05 nov 2015, at 21:48, Mathew Kiang notifications@github.com wrote:
When running the following sample code on simulated data:
n_trt = 1000 n_untrt = 1000 prob_trt_male = .75 prob_untrt_male = .15 n_trt_males = sum(rbinom(n_trt, 1, prob_trt_male)) n_trt_females = n_trt - n_trt_males n_untrt_males = sum(rbinom(n_trt, 1, prob_untrt_male)) n_untrt_females = n_untrt - n_untrt_males
fake <- data.frame(trt = c(rep(0, n_untrt), rep(1, n_trt)), sex = c(rep(0, n_untrt_females), rep(1, n_untrt_males), rep(0, n_trt_females), rep(1, n_trt_males))) fake$old <- NA for (i in seq_along(fake$trt)) { if (fake$trt[i] + fake$sex[i] == 2){ fake$old[i] <- rbinom(1, 1, .1) fake$bp[i] <- rnorm(1, 100, 20) } else if (fake$trt[i] + fake$sex[i] == 0) { fake$old[i] <- rbinom(1, 1, .9) fake$bp[i] <- rnorm(1, 280, 50) } else { fake$old[i] <- rbinom(1, 1, .5) fake$bp[i] <- rnorm(1, 175, 8) } }
imbalance(fake$trt, fake, drop = "trt") Returns this output:
Multivariate Imbalance Measure: L1=0.954 Percentage of local common support: LCS=13.0%
Univariate Imbalance Measures:
statistic type L1 min 25% 50% 75% max
sex -0.5850 (diff) 0.585 0.00000 0.000 -1.0000 -1.0000 0.0000 old 0.6410 (diff) 0.641 0.00000 1.000 1.0000 1.0000 0.0000 bp 143.4229 (diff) 0.000 85.02359 121.272 158.6741 153.2892 233.4842 The L1 of bp is 0.000, which seems impossible. I can replicate it with almost any continuous variable in R, but when running this in Stata, the L1 (both univariate and multivariate) seems appropriately calculated.
— Reply to this email directly or view it on GitHub.
Sure. Thanks for prompt response:
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] tcltk stats graphics grDevices utils datasets methods base
other attached packages:
[1] cem_1.1.17 lattice_0.20-33
loaded via a namespace (and not attached):
[1] MASS_7.3-44 combinat_0.0-8 tools_3.2.2 nlme_3.1-122 grid_3.2.2
[6] randomForest_4.6-12 MatchIt_2.4-21
Perhaps this should be a completely separate issue, but it also appears in Stata 13 that imbalance
performs complete case analysis and thus results in a different L1 than R if there is any missing data.
Hi @siacus, just wondering if there was any movement in this area? I'm planning on taking a closer look over the next few weeks and possibly coding my own (slower, less generalized, less efficient) method — don't want to duplicate effort if you've spotted the problem.
Hi Mathew, I’m a bit late on this. I suggest you take a look at the come in CEM package and try to fix it. Then you send me the code and I’ll try to optimize and integrate in the new release of CEM with acknowledgment. What do you think? Stefano
Il giorno 26 feb 2016, alle ore 22:16, Mathew Kiang notifications@github.com ha scritto:
Hi @siacus https://github.com/siacus, just wondering if there was any movement in this area? I'm planning on taking a closer look over the next few weeks and possibly coding my own (slower, less generalized, less efficient) method — don't want to duplicate effort if you've spotted the problem.
— Reply to this email directly or view it on GitHub https://github.com/IQSS/cem/issues/1#issuecomment-189485967.
Sure. I can give it a go but it'll take a while -- not a lot of bandwidth for the next couple weeks.
Sent from my iPhone
On Fri, Feb 26, 2016 at 6:53 PM -0800, "Stefano Iacus" notifications@github.com wrote:
Hi Mathew,
I’m a bit late on this.
I suggest you take a look at the come in CEM package and try to fix it. Then you send me the code and I’ll try to optimize and integrate in the new release of CEM with acknowledgment.
What do you think?
Stefano
Il giorno 26 feb 2016, alle ore 22:16, Mathew Kiang notifications@github.com ha scritto:
Hi @siacus https://github.com/siacus, just wondering if there was any movement in this area? I'm planning on taking a closer look over the next few weeks and possibly coding my own (slower, less generalized, less efficient) method — don't want to duplicate effort if you've spotted the problem.
—
Reply to this email directly or view it on GitHub https://github.com/IQSS/cem/issues/1#issuecomment-189485967.
— Reply to this email directly or view it on GitHub.
After a few hours, I can't seem to find the issue here. My sense is it around the xtabs()
command and continuous variables, but I don't have the bandwidth to look closer. Just going to leave it open, but noting I won't be able to investigate further.
When running the following sample code on simulated data:
Returns this output:
The L1 of
bp
is 0.000, which seems impossible. I can replicate it with almost any continuous variable in R, but when running this in Stata, the L1 (both univariate and multivariate) seems appropriately calculated.