Closed RianneSchouten closed 2 years ago
sorry for chipping in again, just wanted to mention an issue imho that is slightly related to the check.patterns
https://github.com/amices/mice/blob/2cd32a71e6f9dbf42c15bc63dc1bc67177e20e75/R/ampute.R#L258-L270
is it possible that the if (!bycases) {}
block should be processed after patterns are cleaned of rows with only ones?
because if I understand recalculate.prop correctly
https://github.com/amices/mice/blob/2cd32a71e6f9dbf42c15bc63dc1bc67177e20e75/R/ampute.R#L505-L521
cases <- vapply( seq_len(nrow(patterns)), function(i) (miss * freq[i]) / length(patterns[i, ][patterns[i, ] == 0]), numeric(1) )
leads to division by 0 for rows of ones thus Inf
in the cases vector and the evaluation of if (sum(cases) > n)
is always true?
but maybe I'm wrong.
sorry for bothering again
@lederb I think you are right, I moved the if (!bycases) {}
block until after the patterns.new, freq and prop have been recalculated in commit 6b2be70
@RianneSchouten sorry it's me again 😄
I now pretty much finished porting the ampute functionality to a different ML system and one additional question came up for me:
It would be why the proportion adjustment for cellwise amputation (bycases = FALSE) in recalculate.prop is using n^2 (n = ncol(data) when being called)?
https://github.com/amices/mice/blob/6b2be705a224b547957724c1bd0dac79b4e721e6/R/ampute.R#L514-L530
shouldnt it be miss = prop * nrow(data) * ncol(data)
and if (sum(cases) > nrow(data))
and also prop = sum(cases) / nrow(data)
cheers Bernhard
@lederb
Again, you are right. Thank you very much for noticing these mistakes. I appreciate it.
Can I ask, to which ML system are you porting the function?
Bests, Rianne
@RianneSchouten
sure. for a university project I'm porting it to https://github.com/apache/systemds which is ran under the open source Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0) and has its own R-like script language.
I hope that is okay in terms of legality?
@stefvanbuuren @gerkovink is it possible to accept this PR?
Dear Stef, dear Gerko,
I have made adjustments in ampute() regarding the calculation of prop, patterns and weights in the situation that a user specifies a pattern with only 1s (issue #449). I have furthermore added unittests and evaluated that all other tests work.
Can you accept my pull request?
Thank you, Rianne