Closed joseph-palmer closed 3 years ago
👋 Thanks for opening this pull request! Can you please run through the following checklist before requesting review (ticking as complete or if not relevant).
devtools::load_all(); devtools::test()
) first setting options(testDownload=TRUE, testSource=class-name)
and report your findings. devtools::load_all(); devtools::test()
). Thank you again for the contribution. If making large scale changes consider using our pre-commit
hooks (see the contributing guide) to more easily comply with our guidelines.
So I was thinking here we would have a core set of options we always run and then an optional list you can alter?
So in my mind DataClass
would have a field like
process_fns = c(replace_zero_with_na)
which can get changed at initialisation or in a certain class.
Then similar to how you have things implemented we would have:
process_internal <- function(process_fns).{
processed <- default_fns(clean)
processed <- optional_fns(processed, process_fns = process_fns)
}
Where optional fns is:
optional_fns <- fuction(processed, process_fns) {
for (i in seq_along(process_fns)) {
processed <- do.call(process_fns[I], processed)
}
return(processed)
}
I have changed the way this works and the tests. It now has a field in DataClass
which storees a lit of functions passed in at initalization, with set_negative_values_to_zero
as a default item in the list. Users can pass any function to work on the data and they will be applied after the default functions have worked.
Need to think about where they go with grouping the data, at them moment they are applied to the grouped data but this can be modified. I will look over the process internal function and see if I can reformat it to work faster and be more verbose.
Their were some merge conflicts not showing as merge conflicts so just jumped it to fix.
In doing so saw maybe a few edge cases that were a little strange (I don't think you could empty the default of process_fns
?) + expanded testing a little. @joseph-palmer can you check and see if you are happy?
This PR makes data processing more modular by allowing users to 'switch off' running individual functions used during processing. The best case for doing this would be if the user wanted to keep negative values, rather than replace them with zeros. Currently this PR supports turning off other functions (
calculate_columns_from_existing_data
for example), but doing this may cause errors.This works by the user passing a list of processing options (
process_options
) at initalisation, this must be a list of functions, e.g.process_options = list("set_negative_values_to_zero" = FALSE)
. At initalisationformat_process_options
takes these arguments, does checks on them and alters a list of known functions to remove those which are false. In processing only the list of function names are filtered to those that areTRUE
and called, thus removeing those marked FALSE.Switching off procesing functions at the minute only really works for 'set_negative_values_to_zero', other cause errors as what they add gets used further down the line. Nevertheless, using this method is scalable (and we can remove the others from the options to change) should we add any other procesing steps in which a user may want to play with.
TODO
format_process_options
within the processing step as at the moment a user could pass arguments after initalisation by replacingself$process_options
closes #317