epiverse-trace / linelist

R package for handling linelist data
https://epiverse-trace.github.io/linelist/
Other
8 stars 4 forks source link

print an informative output even if there is no validation issue #143

Open avallecam opened 1 month ago

avallecam commented 1 month ago

Is your feature request related to a problem? Please describe.

The output of validate_linelist() is equivalent to the output of make_linelist(). It would be informative to users to get a message that validation was successfull.

library(outbreaks)
library(tidyverse)
library(linelist)

# cleaned data
ebola <- outbreaks::ebola_sim_clean %>% 
  pluck("linelist") %>% 
  as_tibble()

# messy data
ebola_messy <- ebola %>% 
  mutate(date_of_onset = as.character(date_of_onset))

# ebola %>% glimpse()
# linelist::tags_names()

# validate messy data - rejected
ebola_messy %>% 
  make_linelist(id = "case_id",date_onset = "date_of_onset") %>% 
  validate_linelist()
#> Error: Some tags have the wrong class:
#>   - date_onset: Must inherit from class 'integer'/'numeric'/'Date'/'POSIXct'/'POSIXlt', but has class 'character'

# validate cleaned data - passed
ebola %>%  
  make_linelist(id = "case_id",date_onset = "date_of_onset") %>% 
  validate_linelist()
#> 
#> // linelist object
#> # A tibble: 5,829 Ɨ 11
#>    case_id generation date_of_infection date_of_onset date_of_hospitalisation
#>    <chr>        <int> <date>            <date>        <date>                 
#>  1 d1fafd           0 NA                2014-04-07    2014-04-17             
#>  2 53371b           1 2014-04-09        2014-04-15    2014-04-20             
#>  3 f5c3d8           1 2014-04-18        2014-04-21    2014-04-25             
#>  4 6c286a           2 NA                2014-04-27    2014-04-27             
#>  5 0f58c4           2 2014-04-22        2014-04-26    2014-04-29             
#>  6 49731d           0 2014-03-19        2014-04-25    2014-05-02             
#>  7 f9149b           3 NA                2014-05-03    2014-05-04             
#>  8 881bd4           3 2014-04-26        2014-05-01    2014-05-05             
#>  9 e66fa4           2 NA                2014-04-21    2014-05-06             
#> 10 20b688           3 NA                2014-05-05    2014-05-06             
#> # ā„¹ 5,819 more rows
#> # ā„¹ 6 more variables: date_of_outcome <date>, outcome <fct>, gender <fct>,
#> #   hospital <fct>, lon <dbl>, lat <dbl>
#> 
#> // tags: id:case_id, date_onset:date_of_onset

Created on 2024-07-30 with reprex v2.1.0

Describe the solution you'd like A printed message like All tagged variables are valid. or similar.

Additional context A similar request was suggested for the cleanepi package at https://github.com/epiverse-trace/cleanepi/issues/150

Bisaloo commented 1 month ago

What about the case where this is used in a pipeline and the user just wants the pipeline to continue if everything is valid.

Should this behaviour be controlled by an extra argument?

avallecam commented 1 month ago

What about the case where this is used in a pipeline and the user just wants the pipeline to continue if everything is valid.

Should this behaviour be controlled by an extra argument?

if everything is valid, and we get a message similar to a "warning" (or just a neutral print message) I think this will allow the pipeline to continue, right?

I imagine sth similar to the

#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

as in this reprex:

library(tidyverse)
starwars %>% 
  ggplot(aes(x = height)) +
  geom_histogram()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Removed 6 rows containing non-finite outside the scale range
#> (`stat_bin()`).

Created on 2024-08-08 with reprex v2.1.0

Bisaloo commented 1 month ago

if everything is valid, and we get a message similar to a "warning" (or just a neutral print message) I think this will allow the pipeline to continue, right?

Yes, you're correct, the pipeline can continue. One problem with long pipelines however is that you get overflowed with messages and they are no longer helpful.

@chartgerink, any opinions on this, as it is relevant for datatagr as well?

chartgerink commented 1 month ago

I don't see any problems with that šŸ‘ Thanks for the suggestion @avallecam šŸ˜„