I've had a look through most of the assay data in the megastudy at this point, to try to make sure that my plan for weighted averages, etc to accommodate differing sample sizes will work as intended. For the most part i think it will, with the exception of Pathogen presence/ absence. For a pooled sample, the most information this variable provides, if positive, is that at least one specimen was positive. The values are consequently not comparable across individuals and pooled samples.
Do we want to exclude this term from visualizations, but allow other pathogen assay variables? If the majority of pathogen data is of this type, that could be less than ideal. But we could also maybe build some dedicated tools to handle this data in the future using this R package.
I've had a look through most of the assay data in the megastudy at this point, to try to make sure that my plan for weighted averages, etc to accommodate differing sample sizes will work as intended. For the most part i think it will, with the exception of Pathogen presence/ absence. For a pooled sample, the most information this variable provides, if positive, is that at least one specimen was positive. The values are consequently not comparable across individuals and pooled samples.
Do we want to exclude this term from visualizations, but allow other pathogen assay variables? If the majority of pathogen data is of this type, that could be less than ideal. But we could also maybe build some dedicated tools to handle this data in the future using this R package.