United4Surveillance / signal-detection-tool

A tool for detection of signals in infectious disease surveillance data.
Other
8 stars 0 forks source link

[BUG] age_groups() special case for provided age_groups #262

Closed tinneuro closed 6 months ago

tinneuro commented 7 months ago

Describe the Bug

When the dataset for the agegroups looks like this: data_frame <- data.frame(age = c(0,2, 5, 15, 16,73,82), age_group = c("<1","01-05", "6-10", "11-15", "16-20","71-80","81-")) with two times the usage of "-" once used as main separator and once used for the final age group "81-" the age_groups(data_frame) function throws an error due to problems with the result of age_format_check(data_frame).

Do we want to explicitly restrict and give the feedback that instead of "81-" "81+" should be used or do we want to go and make the code robust for this? In case of restriction then maybe there should be a data check added for the format of the age_groups so this is not possible?

To Reproduce

Steps to reproduce the behavior:

  1. data_frame <- data.frame(age = c(0,2, 5, 15, 16,73,82), age_group = c("<1","01-05", "6-10", "11-15", "16-20","71-80","81-"))
  2. age_groups(data_frame)
  3. Look at the output of age_format_check(data_frame)

Expected Behavior

Decision and no more errors for this case.

Screenshots / Code Snippets (if applicable)

If applicable, add screenshots, code snippets, or logs to help explain your problem.

Environment (please complete the following information):

Additional Context

Add any other context about the problem here.

tinneuro commented 7 months ago

@jaemol and me discussed about this and came to the following conclusion: In this case we want to be more restrictive about the age_group data input format and do not want to make the code for age_groups() more robust to be able to cope with this.

ToDos: Update the SOP about the data input format for the age_group to state that the final age_group can be like the ones before i.e. 100-105 or it can be 100+ instead. Update the data input check so that this formatting is checked and there is no other "special" character used for the last age_group. Potentially can use the already exisiting function age_format_check(data_frame) as help.