ajdamico / convey

variance of distribution measures estimation of survey data
GNU General Public License v3.0
18 stars 7 forks source link

consider adding an attribute to keep linearized variable and/or replicates #242

Closed guilhermejacob closed 2 years ago

guilhermejacob commented 7 years ago

The lin attribute can be very useful in composed measures, like the svygini is used inside svysen and svysst. Maybe we should add some parameter to return or not this in the result, as it might be too big for large surveys.

ajdamico commented 7 years ago

hi, if you want to add this, consider copying deff=TRUE or keep.var=TRUE. the survey package will skip the entire variance calculation if you tell it to, sometimes by default. here's an example of returned results-

library(survey)
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svymean(~api00, dclus1)
deff( svymean(~api00, dclus1) )

svymean(~api00, dclus1, deff=TRUE)
deff( svymean(~api00, dclus1, deff=TRUE) )
DjalmaPessoa commented 7 years ago

the lin attribute is used in all the functions that use the contrastinf function: svyafcdec, svybmi, svygeidec, svygini, svyjdiv, svyjdivdec, svyqsr, svyrmir, svyrmpg, svysen, svysst.

On Wed, Apr 19, 2017 at 11:01 AM, Anthony Damico notifications@github.com wrote:

hi, if you want to add this, consider copying deff=TRUE or keep.var=TRUE. the survey package will skip the entire variance calculation if you tell it to, sometimes by default. here's an example of returned results-

library(survey) data(api) dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svymean(~api00, dclus1) deff( svymean(~api00, dclus1) )

svymean(~api00, dclus1, deff=TRUE) deff( svymean(~api00, dclus1, deff=TRUE) )

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DjalmaPessoa/convey/issues/242#issuecomment-295280672, or mute the thread https://github.com/notifications/unsubscribe-auth/AFD-p_i8S-qteCQvCVMdDuFFyV-luTuJks5rxhPJgaJpZM4NBtpG .

guilhermejacob commented 7 years ago

Could we add something like lin=TRUE in the function call?

DjalmaPessoa commented 7 years ago

I think so, but we would have to change all functions that already have the attribute. Anthony knows how to do it.

On Wed, Apr 19, 2017 at 11:33 AM, Guilherme Jacob notifications@github.com wrote:

Could we add something like lin=TRUE in the function call?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DjalmaPessoa/convey/issues/242#issuecomment-295291266, or mute the thread https://github.com/notifications/unsubscribe-auth/AFD-p3axpFSITDuVvXNP5nfi67YUHth3ks5rxhtRgaJpZM4NBtpG .

ajdamico commented 7 years ago

i won't get to this soon, but you can assign to me

DjalmaPessoa commented 7 years ago

Anthony, I just saw how to use deff =TRUE in the svymean function of the library survey. For the function svyarpr would it be ?: svyarpr.survey.design <- function(formula, design, quantiles = 0.5, percent = 0.6, na.rm=FALSE, lin=FALSE, ...){ ...... if (is.character(lin) || lin) attr(rval, "lin") <- arprlin

..... }

is just that?

ajdamico commented 7 years ago

i think it is that straightforward, yes :) if you are anxious to add this, double-check that dr. lumley structured keep.var= the same way.. but i will do this eventually if nobody else does

DjalmaPessoa commented 7 years ago

ok, I'm not anxious at all! I'd rather wait and be sure that you will do it correctly:) Thanks

guilhermejacob commented 7 years ago

maybe there should be a similar option to keep replicates on replication-based procedures

ajdamico commented 7 years ago

are any attributes needed but discarded for replication-based svysen and svysst?

On Apr 24, 2017 11:39 PM, "Guilherme Jacob" notifications@github.com wrote:

maybe there should be a similar option to keep replicates on replication-based procedures

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/DjalmaPessoa/convey/issues/242#issuecomment-296902450, or mute the thread https://github.com/notifications/unsubscribe-auth/AANO5yXH1TVUC6Te48A9DhYzZ9S3Ns6tks5rzXjzgaJpZM4NBtpG .

guilhermejacob commented 7 years ago

@ajdamico , sorry. What do you mean?

ajdamico commented 7 years ago

you say "The lin attribute can be very useful in composed measures, like the svygini is used inside svysen and svysst." what is the comparable attribute for replication designs?

On Apr 25, 2017 8:34 AM, "Guilherme Jacob" notifications@github.com wrote:

@ajdamico https://github.com/ajdamico , sorry. What do you mean?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/DjalmaPessoa/convey/issues/242#issuecomment-297031514, or mute the thread https://github.com/notifications/unsubscribe-auth/AANO5-cnNxHAFNACco5Ygq7tv92NHiOOks5rzfZvgaJpZM4NBtpG .

guilhermejacob commented 7 years ago

Oh, ok. The replicates (usually, the qq object in the functions) are of similar importance. They can also be used to calculate another form of confidence intervals.

ajdamico commented 7 years ago

so keep.lin= and keep.qq= the way svyquantile() has keep.var= then. thanks

guilhermejacob commented 7 years ago

cool, but I would suggest using a name that works across both kinds of designs. Something like keep.linrep.

guilhermejacob commented 7 years ago

also, it would be nice to add a test to make sure that length(lin) equals the number of observations in the design.

DjalmaPessoa commented 7 years ago

length(lin) is different from the total sample size when we work with design subset. For example, when we use the function svyby from the library survey to get domein estimates.

The library vardpoor always works with the lenght(lin) having the full length and uses domain indicators. This could be done in convey but it would imply not using svyby from survey.

We need to use lin with the full length when using the threshold estimated based on the whole sample, like the arpt. For this case the function convey_prep is needed.

I think this is too technical to the user and probably is not going to be used by them.

As for the name linrep, we should avoid mixing up complety different concepts. Linearization has to do with aproximating numerically the parameter to be estimated and has nothing to do with the replication of estimates (resampling) .

guilhermejacob commented 7 years ago

@DjalmaPessoa , agreed. Also, keeping the full linearized variable for beyond the sample would be a problem in large datasets.

guilhermejacob commented 3 years ago

As of 2021, some functions of the survey package include: an influence attribute for linearisation-based variance estimation and a return.replicates for replicate-based variance estimation. The influence in survey is not exactly the influence function/linearized variable in convey.

Why do people use it for? This would help us to solve the issue #148 in repeated samples. Or to account for covariance across domains. All with svyby.