Big refactor - Githubissues

fredjaya commented 1 week ago

More concrete thoughts after our chat, and how to work on tackling it iteratively. Thoughts and comments please @AngusMcLure!

Goals of refactoring

reduce the shiny-specific code required to get PoolPoweR working
make the package more extensible for variations in existing functions (i.e. two-sample power)
reduce repetition of shared parameters and function names

1. Encapsulating sample design parameters

Current way to run two power functions with shared parameters:

power_pool(pool_size = 10, pool_number = 2, cluster_number = 50,
            prevalence_null = 0.01, prevalence_alt = 0.02, 
            sensitivity = 1, specificity = 1)

sample_size_pool(pool_size = 10, pool_number = 2,
                 prevalence_null = 0.01, prevalence_alt = 0.02,
                 correlation = 0.01, sensitivity = 1, specificity = 1)

Proposed way is to first store common parameters in an S3 class sample_design and pass that to the functions

sample_design <- function(...) { 
    ... 
    return(s3_class_with_params)
}

# Construct a class with sample design parameters
design_params <- sample_design(pool_size = 10, pool_number = 2, sensitivity = 1, specificity = 1)

# Pass shared params to functions
power_pool(design_params, cluster_number = 50, prevalence_null = 0.01, prevalence_alt = 0.02)

sample_size_pool(design_params, prevalence_null = 0.01, prevalence_alt = 0.02, correlation = 0.01)

The fixed sampling implementation (above) should have the following attributes:

pool_size
pool_number (for now)
sensitivity
specificity

The random/catch sampling implementation attrs:

catch_dist
pool_strat
sensitivity
specificity

Also consider the names for each class e.g. fixed_design and random_design or catch_design, but both will both be of class sample_design.

Cons:

Cumbersome when running functions with different sample_design() params
Nice to separate the data with the calculation but will need to unpack sample_design() every time making the class and functions tightly coupled

To-dos:

[x] Write constructor
[x] Add constructor tests
[x] Move input checks
[ ] Replace usage across all functions
[ ] Update existing tests
[ ] Repeat above for random/catch

2. Add `sample_design` methods

The use of S3 classes and methods should also improve naming. e.g. each fixed_design and random_design. Currently:

power_pool(...)
sample_size_pool(...)
power_pool_random(...)
sample_size_pool_random(...)

Proposed, with method dispatch:

# tentative names
power(fixed_design, ...)
sample_size(fixed_design, ...)
power(random_design, ...)
sample_size(random_design, ...)

The method dispatch will also make the package more extensible for new variations without having really long function names e.g. power e.g. two-sample, one-sample etc.

I think user-facing functions should be given priority for converting functions to sampledesign methods i.e. power/optimise funcs. Non-user facing functions (e.g. fi\, cost_fi...) not so sure yet as it will introduce a lot of dependencies between functions and complicate the code prematurely.

Cons:

Usability not as straightforward with nested classes e.g.

design_params <- random_design(
    catch_dist = nb_catch(20, 25), # s3 class
    pool_strat = pool_target_number(2), # s3 class
)

power_pool(
    design_params,
    ...
)

Solve with really good/clear documentation and examples, or provide default values for classes.

To-dos:

[ ] Come back to this after addressing 1 and 3
[ ] Identify possible names for power variations, then rename the current power_pool (replace pool with exact thing)

3. Refactor power.R functions

Currently all 4 power.R functions have a lot of duplicated code such as the:

power function
fia and fi0 (lots of repetition here too, only difference is theta/theta0)
the links

To-dos

[ ] Move power into helper function
[ ] Refactor fia and fi0 and move into helper function

Hopefully refactoring these will make converting to methods easier.

fredjaya commented 1 week ago

Documented the dependencies for fisher_information.R functions. Key things to note, and impact whether fi_ funcs should be converted to a method are:

design_effect funcs have a fixed fi_pool(pool_size = 1)
minor naming differences in fi_cluster, don't think these impact the behaviour
based on the second point in the previous post, fi_pool_random and fi_cluster_random are good candidates for converting to S3 methods as they are directly used by user-facing functions e.g.

# power funcs use only fi_pool_cluster() and fi_pool_cluster_random()
fi(fixed_design) # fi_pool_cluster()
fi(random_design) # fi_pool_cluster_random()

`fi_pool`

Function	Usage
`fi_pool_cluster`	`fi_pool(pool_size, prevalence, sensitivity, specificity)`
`fi_pool_random`	`fi_pool(pooling$pool_size, prevalence, sensitivity, specificity)`
`design_effect`	`fi_pool(pool_size = 1, prevalence, sensitivity, specificity)`
`design_effect_random`	`fi_pool(pool_size = 1, prevalence, sensitivity, specificity)`
`cost_fi`	`fi_pool(pool_size, prevalence, sensitivity, specificity)`

`fi_cluster`

Function	Usage
`design_effect`	`fi_pool_cluster(pool_size, pool_number, prevalence, correlation, sensitivity, specificity, form)`
`fi_cluster_random`	`fi_pool_cluster(pooling$pool_size, pooling$pool_number, prevalence, correlation, sensitivity, specificity, form, real_scale)`
`cost_fi_random`	`fi_pool_cluster(s, N, prevalence, correlation, sensitivity, specificity, form)`
`power_pool`	`fi_pool_cluster(pool_size, pool_number, prevalence, correlation, sensitivity, specificity, form)`
`sample_size_pool`	`fi_pool_cluster(pool_size, pool_number, prevalence, correlation, sensitivity, specificity, form)`

`fi_pool_random`

Function	Usage
`cost_fi_random`	`fi_pool_random(catch_dist,pool_strat,prevalence,sensitivity, specificity)`

`fi_cluster_random`

Function	Usage
`design_effect_random`	`fi_pool_cluster_random(catch_dist, pool_strat, prevalence, correlation, sensitivity, specificity, form)`
`cost_fi_cluster_random`	`fi_pool_cluster_random(catch_dist, pool_strat, prevalence, correlation, sensitivity, specificity, form)`
`power_pool_random`	`fi_pool_cluster_random(catch_dist, pool_strat, prevalence, correlation, sensitivity, specificity, form)`
`sample_size_pool_random`	`fi_pool_cluster_random(catch_dist, pool_strat, prevalence, correlation, sensitivity, specificity, form)`

classDiagram

fi_pool <-- fi_cluster
fi_pool <-- fi_pool_random
fi_pool <-- design_effect
fi_pool <-- design_effect_random
fi_pool <-- cost_fi

fi_cluster <-- design_effect
fi_cluster <-- fi_cluster_random
fi_cluster <-- cost_fi_random
fi_cluster <-- power
fi_cluster <-- sample_size

fi_pool_random <-- cost_fi_random

fi_cluster_random <-- design_effect_random
fi_cluster_random <-- cost_fi_cluster_random
fi_cluster_random <-- power_random
fi_cluster_random <-- sample_size_random

namespace fisher_information {
    class fi_pool {
    }
    class fi_cluster {
    }
    class fi_pool_random {
    }
    class fi_cluster_random {
    }
}

namespace cost {
    class cost_fi {
    }
    class cost_fi_random {
    }
    class cost_fi_cluster_random {
    }
}

namespace design {
    class design_effect {
    }
    class design_effect_random {
    }
}
namespace power_ns {
    class power {
    }
    class sample_size {
    }
    class power_random {
    }
    class sample_size_random {
    }
}

fredjaya commented 1 week ago

Notes from our meeting today:

Include cluster_number insample_design classes
[x] sample_design class names to be fixed_design and variable_design. Leave PoolTools dropdowns as fixed sample size and fixed sampling period, respectively~~
[x] leave sample_design params NULL by default for future optimise() implementations
order of func to method conversion:
- [x] design effect
- [ ] optimise
- [ ] power
[ ] add pretty print methods for sample_design that include e.g. total X

AngusMcLure / PoolPoweR

Big refactor #40

1. Encapsulating sample design parameters

2. Add `sample_design` methods

3. Refactor power.R functions

`fi_pool`

`fi_cluster`

`fi_pool_random`

`fi_cluster_random`

AngusMcLure / PoolPoweR

Big refactor #40

1. Encapsulating sample design parameters

2. Add sample_design methods

3. Refactor power.R functions

fi_pool

fi_cluster

fi_pool_random

fi_cluster_random

2. Add `sample_design` methods

`fi_pool`

`fi_cluster`

`fi_pool_random`

`fi_cluster_random`