AngusMcLure / PoolPoweR

Power and sample size calculations for surveys using pool testing (AKA group testing)
GNU General Public License v3.0
0 stars 1 forks source link

Big refactor #40

Open fredjaya opened 1 week ago

fredjaya commented 1 week ago

More concrete thoughts after our chat, and how to work on tackling it iteratively. Thoughts and comments please @AngusMcLure!

Goals of refactoring

1. Encapsulating sample design parameters

Current way to run two power functions with shared parameters:

power_pool(pool_size = 10, pool_number = 2, cluster_number = 50,
            prevalence_null = 0.01, prevalence_alt = 0.02, 
            sensitivity = 1, specificity = 1)

sample_size_pool(pool_size = 10, pool_number = 2,
                 prevalence_null = 0.01, prevalence_alt = 0.02,
                 correlation = 0.01, sensitivity = 1, specificity = 1)

Proposed way is to first store common parameters in an S3 class sample_design and pass that to the functions

sample_design <- function(...) { 
    ... 
    return(s3_class_with_params)
}

# Construct a class with sample design parameters
design_params <- sample_design(pool_size = 10, pool_number = 2, sensitivity = 1, specificity = 1)

# Pass shared params to functions
power_pool(design_params, cluster_number = 50, prevalence_null = 0.01, prevalence_alt = 0.02)

sample_size_pool(design_params, prevalence_null = 0.01, prevalence_alt = 0.02, correlation = 0.01)

The fixed sampling implementation (above) should have the following attributes:

The random/catch sampling implementation attrs:

Also consider the names for each class e.g. fixed_design and random_design or catch_design, but both will both be of class sample_design.

Cons:

To-dos:

2. Add sample_design methods

The use of S3 classes and methods should also improve naming. e.g. each fixed_design and random_design. Currently:

power_pool(...)
sample_size_pool(...)
power_pool_random(...)
sample_size_pool_random(...)

Proposed, with method dispatch:

# tentative names
power(fixed_design, ...)
sample_size(fixed_design, ...)
power(random_design, ...)
sample_size(random_design, ...)

The method dispatch will also make the package more extensible for new variations without having really long function names e.g. power e.g. two-sample, one-sample etc.

I think user-facing functions should be given priority for converting functions to sampledesign methods i.e. power/optimise funcs. Non-user facing functions (e.g. fi\, cost_fi...) not so sure yet as it will introduce a lot of dependencies between functions and complicate the code prematurely.

Cons:

design_params <- random_design(
    catch_dist = nb_catch(20, 25), # s3 class
    pool_strat = pool_target_number(2), # s3 class
)

power_pool(
    design_params,
    ...
)

Solve with really good/clear documentation and examples, or provide default values for classes.

To-dos:

3. Refactor power.R functions

Currently all 4 power.R functions have a lot of duplicated code such as the:

To-dos

Hopefully refactoring these will make converting to methods easier.

fredjaya commented 1 week ago

Documented the dependencies for fisher_information.R functions. Key things to note, and impact whether fi_ funcs should be converted to a method are:

# power funcs use only fi_pool_cluster() and fi_pool_cluster_random()
fi(fixed_design) # fi_pool_cluster()
fi(random_design) # fi_pool_cluster_random() 

fi_pool

Function Usage
fi_pool_cluster fi_pool(pool_size, prevalence, sensitivity, specificity)
fi_pool_random fi_pool(pooling$pool_size, prevalence, sensitivity, specificity)
design_effect fi_pool(pool_size = 1, prevalence, sensitivity, specificity)
design_effect_random fi_pool(pool_size = 1, prevalence, sensitivity, specificity)
cost_fi fi_pool(pool_size, prevalence, sensitivity, specificity)

fi_cluster

Function Usage
design_effect fi_pool_cluster(pool_size, pool_number, prevalence, correlation, sensitivity, specificity, form)
fi_cluster_random fi_pool_cluster(pooling$pool_size, pooling$pool_number, prevalence, correlation, sensitivity, specificity, form, real_scale)
cost_fi_random fi_pool_cluster(s, N, prevalence, correlation, sensitivity, specificity, form)
power_pool fi_pool_cluster(pool_size, pool_number, prevalence, correlation, sensitivity, specificity, form)
sample_size_pool fi_pool_cluster(pool_size, pool_number, prevalence, correlation, sensitivity, specificity, form)

fi_pool_random

Function Usage
cost_fi_random fi_pool_random(catch_dist,pool_strat,prevalence,sensitivity, specificity)

fi_cluster_random

Function Usage
design_effect_random fi_pool_cluster_random(catch_dist, pool_strat, prevalence, correlation, sensitivity, specificity, form)
cost_fi_cluster_random fi_pool_cluster_random(catch_dist, pool_strat, prevalence, correlation, sensitivity, specificity, form)
power_pool_random fi_pool_cluster_random(catch_dist, pool_strat, prevalence, correlation, sensitivity, specificity, form)
sample_size_pool_random fi_pool_cluster_random(catch_dist, pool_strat, prevalence, correlation, sensitivity, specificity, form)
classDiagram

fi_pool <-- fi_cluster
fi_pool <-- fi_pool_random
fi_pool <-- design_effect
fi_pool <-- design_effect_random
fi_pool <-- cost_fi

fi_cluster <-- design_effect
fi_cluster <-- fi_cluster_random
fi_cluster <-- cost_fi_random
fi_cluster <-- power
fi_cluster <-- sample_size

fi_pool_random <-- cost_fi_random

fi_cluster_random <-- design_effect_random
fi_cluster_random <-- cost_fi_cluster_random
fi_cluster_random <-- power_random
fi_cluster_random <-- sample_size_random

namespace fisher_information {
    class fi_pool {
    }
    class fi_cluster {
    }
    class fi_pool_random {
    }
    class fi_cluster_random {
    }
}

namespace cost {
    class cost_fi {
    }
    class cost_fi_random {
    }
    class cost_fi_cluster_random {
    }
}

namespace design {
    class design_effect {
    }
    class design_effect_random {
    }
}
namespace power_ns {
    class power {
    }
    class sample_size {
    }
    class power_random {
    }
    class sample_size_random {
    }
}
fredjaya commented 1 week ago

Notes from our meeting today: