AngusMcLure / PoolPoweR

Power and sample size calculations for surveys using pool testing (AKA group testing)
GNU General Public License v3.0
0 stars 1 forks source link

Function documentation #11

Closed fredjaya closed 7 months ago

fredjaya commented 8 months ago

optimise_prevalence.R

Title: Optimising the pool size and number for estimating prevalence.

Description These functions determine cost-effective pooling strategies for estimating the prevalence of a marker in a population. Both functions attempt to choose survey designs that maximise the Fisher Information for given cost or effort. optimise_s_prevalence() calculates the optimal single pool size that balances the cost and accuracy for given the marker prevalence, test sensitivity, and specificity, and works for simple random surveys or cluster surveys. optimise_sN_prevalence also attempts to identify the optimal number of pools per cluster (cluster-surveys only).

Examples

optimise_s_prevalence(prevalence = 0.01, cost_unit = 5, cost_pool = 10)

optimise_sN_prevalence(prevalence = 0.01, cost_unit = 5, cost_pool = 10, cost_cluster = 100, correlation = 0.05)

fisher_information.R

Title: Calculate the Fisher Information of a pooled-survey design for estimating population prevalence.

Description fi_pool and fi_pool_cluster calculates the Fisher Information for pool testing strategies for a given number and size of pools, where the sensitivity and specificity of the test are known. fi_pool calculates the Fisher information for the prevalence for simple random surveys. fi_pool_cluster calculates the two-by-two Fisher information matrix for prevalence and within-cluster correlation for cluster survey designs.

Examples

fi_pool(pool_size = 10, prevalence = 0.01, sensitivity = 0.95, specificity = 0.99)

fi_pool_cluster(pool_size = 10, pool_number = 5, prevalence = 0.01, correlation = 0.05, sensitivity = 0.95, specificity = 0.99)

design_effect()

Title: Calculate the design effect for pooled testing.

Description This function calculates the design effect (D) for survey designs using pool testing compared to a simple random survey with individual tests of the same number of units. This allows the comparison of the Fisher Information per unit sampled across different pooling and sampling strategies. A design effect D>1 (D<1) indicates that the pooling/sampling strategy reduces (increases) the Fisher information per unit; the total sample size will have to be multiplied by a factor of D to achieve the same degree of precision in estimating prevalence as a simple random survey with individual tests. Supports both cluster and simple random sampling with perfect or imperfect tests.

Examples

design_effect(pool_size = 5, pool_number = 10, prevalence = 0.01, sensitivity = 0.99, specificity = 0.95)
fredjaya commented 8 months ago

Angus could you please update/add to the above for function documentation? Hopefully i'm not too far off here...

fredjaya commented 8 months ago

Adding a reminder to avoid using @import/@importFrom

AngusMcLure commented 7 months ago

Hi Fred, I'll do most of the editing on the first post, but just pointing out some things here.

The kind of pool-testing setup is usually done when prevalence is low. Though there's no hard limit, I would keep the prevalence <10% in all the examples and good example value might be 1%.

Similarly cluster surveys only make sense if the clustering is fairly low. Again 0.2 is probably a high correlation and 0.75 is very very high. I nice default might be 0.05. Even this is perhaps high, but I actually need to do some more theoretical/empirical work to figure out anything like a reasonable range for MX surveys. Though I don't think we should have a default value for now, users might need some guidance as to what value to put in here if they have no idea, and that's on my long-term to-do list. It won't be easy as it will probably require looking at lots of different datasets from different countries.

optimise_s_prevalence(prevalence = 0.01, cost_unit = 5, cost_pool = 10)

optimise_sN_prevalence(prevalence = 0.01, cost_unit = 5, cost_pool = 10, cost_cluster = 100, correlation = 0.05)
AngusMcLure commented 7 months ago

Also, make sure that if a function has support for both cluster and non-cluster surveys that you give an example for each. Usually this can be achieved by providing/not providing an input for correlation I think

AngusMcLure commented 7 months ago

I've edited the main post with some rewords/expansion here and there. Didn't take long, so they were pretty close!

fredjaya commented 7 months ago

Awesome, thanks!

Based on this, i'm thinking of moving design_effect to its own .R file - any objections?

AngusMcLure commented 7 months ago

Sounds fine to me. I take it the idea is to keep the docs nice and clean

fredjaya commented 7 months ago

Upcoming push has updated unit tests based on these examples. Separate issue for further examples created.