DeclareDesign / fabricatr

fabricatr: Imagine Your Data Before You Collect It
https://declaredesign.org/r/fabricatr
Other
92 stars 11 forks source link

draw_normal_icc() doesn't allow for full range of ICC [0,1] #149

Open clarabicalho opened 6 years ago

clarabicalho commented 6 years ago

Designers in the DesignLibrary relying on draw_normal_icc () should allow for to return values when ICC = 0 or 1 without requiring the designer to condition the distribution function on the icc value, oder? Also odd that it doesn't allow for ICC <= 0.001 or ICC >= 0.999. Could the default behavior be a warning instead?

clusters = rep(1:5, 10)
draw_normal_icc(clusters = clusters, ICC = 0.001)

Error in draw_normal_icc(clusters = clusters, ICC = 0.001) : An ICC of 0 with a finite within-cluster variance implies zero between-cluster variance. You can generate data with zero ICC using R's standard rnorm command to generate normal data independent of the cluster variable.

draw_normal_icc(clusters = clusters, ICC = 1)

Error in draw_normal_icc(clusters = clusters, ICC = 0.999) : An ICC of 1 with a finite within-cluster variance requires division by zero to infer between-cluster variance. Try a lower ICC or specify between- and within-cluster variance (sd_between and sd) to infer ICC.

macartan commented 6 years ago

for the first case just implementing the normal in this case would make sense (eg so that users can compare over a range tha includes 0 without modifying code) for the second I think the error should be " An ICC of 1 is not possible with positive within-cluster variance. " It ought to be possible with within cluster variance of 0?

On Mon, Aug 20, 2018 at 4:30 PM Clara Bicalho notifications@github.com wrote:

Designers in the DesignLibrary relying on draw_normal_icc () should allow for to return values when ICC = 0 or 1 without requiring the designer to condition the distribution function on the icc value, oder? Also odd that it doesn't allow for ICC <= 0.001 or ICC >= 0.999. Could the default behavior be a warning instead?

clusters = rep(1:5, 10) draw_normal_icc(clusters = clusters, ICC = 0.001)

Error in draw_normal_icc(clusters = clusters, ICC = 0.001) : An ICC of 0 with a finite within-cluster variance implies zero between-cluster variance. You can generate data with zero ICC using R's standard rnorm command to generate normal data independent of the cluster variable.

draw_normal_icc(clusters = clusters, ICC = 1)

Error in draw_normal_icc(clusters = clusters, ICC = 0.999) : An ICC of 1 with a finite within-cluster variance requires division by zero to infer between-cluster variance. Try a lower ICC or specify between- and within-cluster variance (sd_between and sd) to infer ICC.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DeclareDesign/fabricatr/issues/149, or mute the thread https://github.com/notifications/unsubscribe-auth/AMJO_UkDBOJvJnB2jAmvAR74ipnQIKslks5uSsgdgaJpZM4WEHdY .

graemeblair commented 6 years ago

Hi all -- @aaronrudkin weighed in on this:

The way we set up draw_normal_icc was designed to have the user specify one of sd_between or sd_within and ICC. Mathematically, sd_between^2 = (ICC * sd_within^2) / (1 - ICC). So when ICC = 1, this is a divide by zero. Where ICC is very near 1, sd_between goes to near infinity. And similar problems happen with ICC=0. I believe we preferred errors to producing near-nonsense data or allowing R to NaN and causing less interpretable errors later.

I think we should do special casing in the designer, but that this is the right behavior in fabricatr. Feel free to reopen if you disagree.

macartan commented 6 years ago

Understand why this is happening but not sure this is the best handling of these cases.

graemeblair commented 6 years ago

ok sounds good. think this should be done with the https://github.com/DeclareDesign/fabricatr/issues/133 change and also draw_binary_icc should have the same behaviors so a bit of work. given time constraints, will leave out for this version.