amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
444 stars 107 forks source link

adapt prop, patterns and weights matrices for pattern with only 1's, and adding warnings when patterns cannot be generated (#449, #317) #451

Closed RianneSchouten closed 2 years ago

RianneSchouten commented 2 years ago

Dear Stef, dear Gerko,

I have made adjustments in ampute() regarding the calculation of prop, patterns and weights in the situation that a user specifies a pattern with only 1s (issue #449). I have furthermore added unittests and evaluated that all other tests work.

Can you accept my pull request?

Thank you, Rianne

lederb commented 2 years ago

sorry for chipping in again, just wanted to mention an issue imho that is slightly related to the check.patterns

https://github.com/amices/mice/blob/2cd32a71e6f9dbf42c15bc63dc1bc67177e20e75/R/ampute.R#L258-L270

is it possible that the if (!bycases) {} block should be processed after patterns are cleaned of rows with only ones? because if I understand recalculate.prop correctly

https://github.com/amices/mice/blob/2cd32a71e6f9dbf42c15bc63dc1bc67177e20e75/R/ampute.R#L505-L521

cases <- vapply( seq_len(nrow(patterns)), function(i) (miss * freq[i]) / length(patterns[i, ][patterns[i, ] == 0]), numeric(1) )

leads to division by 0 for rows of ones thus Inf in the cases vector and the evaluation of if (sum(cases) > n)

is always true?

but maybe I'm wrong.

sorry for bothering again

RianneSchouten commented 2 years ago

@lederb I think you are right, I moved the if (!bycases) {} block until after the patterns.new, freq and prop have been recalculated in commit 6b2be70

lederb commented 2 years ago

@RianneSchouten sorry it's me again 😄

I now pretty much finished porting the ampute functionality to a different ML system and one additional question came up for me:

It would be why the proportion adjustment for cellwise amputation (bycases = FALSE) in recalculate.prop is using n^2 (n = ncol(data) when being called)?

https://github.com/amices/mice/blob/6b2be705a224b547957724c1bd0dac79b4e721e6/R/ampute.R#L514-L530

shouldnt it be miss = prop * nrow(data) * ncol(data) and if (sum(cases) > nrow(data)) and also prop = sum(cases) / nrow(data)

cheers Bernhard

RianneSchouten commented 2 years ago

@lederb

Again, you are right. Thank you very much for noticing these mistakes. I appreciate it.

Can I ask, to which ML system are you porting the function?

Bests, Rianne

lederb commented 2 years ago

@RianneSchouten

sure. for a university project I'm porting it to https://github.com/apache/systemds which is ran under the open source Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0) and has its own R-like script language.

I hope that is okay in terms of legality?

RianneSchouten commented 2 years ago

@stefvanbuuren @gerkovink is it possible to accept this PR?