bnowok / synthpop

Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control
40 stars 8 forks source link

summary.fit.synds miscalculating variance estimators for simple synthesis when n is not equal to k #9

Closed flynngo closed 6 years ago

flynngo commented 6 years ago

I believe that the variance estimate, T_f, for synthetic data in the case that population.inf = TRUE and incomplete = FALSE is currently being miscalculated in the case that k is not equal to n.

The line of code in question from the function summary.fit.synds is

## simple synthesis   
    } else {
      if (object$proper == FALSE) Tf <- vars*(1 + n/k/m) else Tf <- vars*(1 + (n/k + 1)/m)

and I believe that it should read

## simple synthesis   
    } else {
      if (object$proper == FALSE) Tf <- vars*(k/n + 1/m) else Tf <- vars*(k/n + (k/n + 1)/m)

so that it is consistent with the variance estimators that are define in section 2.2 of Practical data synthesis for large samples (Raab et al, 2016).

gillian-raab commented 6 years ago

Thank you so much for looking at our code, we do appreciatre it. But I think in this instance you may be mistaken. The quantity vars in our code is not the average variance as estimated from the m synthetic data sets (\bar{v_M} in the paper) . In our code (\bar{v_M} is the component mvaravg of the syn object. The quantity vars is calculated on line 11 of the function by multiplying object$mvaravg by k/n. Vars represents the estimate of the variance of the parameters were they estimated from the original data.

Let me know if you agree. Thanks for taking an interest in our work. We are always pleased to hear how someone may be using synthpop, so feel free to email us to let us know.

Gillian Raab gillian.raab@ed.ac.uk

flynngo commented 6 years ago

You are correct and I was mistaken. Thank you very much for your response.

Flynn