ccodwg / Covid19CanadaETL

A Pipeline for Canadian COVID-19 Data
https://opencovid.ca/
Other
2 stars 0 forks source link

dl_datasets() and e_t_datasets() should be more error-tolerant #1

Closed jeanpaulrsoucy closed 1 year ago

jeanpaulrsoucy commented 3 years ago

Errors should not stop/reset the entire update process.

jeanpaulrsoucy commented 2 years ago

This has been accomplished, for the most part.

jeanpaulrsoucy commented 2 years ago

Function-stopping error when trying to retry multiple datasets at once (retrying one dataset seems to work):

WAITING 10 SECONDS BEFORE RETRY...
RETRY SUCCESSFUL:  2e7a5549-92ae-473d-a97a-7b8e0c1ddbbc e00e2148-b0ea-458b-9f00-3533e0c5ae8e
Error in ds[[ds_failed]] <- tryCatch({ : 
  more elements supplied than there are to replace

It seems to be trying to do both datasets at once.

jeanpaulrsoucy commented 2 years ago

If reading from Google Sheets fails, the script ends. Instead, the section should be skipped (and the error reported).

jeanpaulrsoucy commented 2 years ago

Additional processing in e_t_datasets should also be more error tolerant: if the main function returns NA (e.g., because the dataset was not available), further processing (e.g., process_hr_names, dplyr::summarize, etc.) should either be skipped or done with error catching (better, since the functions could potentially fail for other reasons).

Update: edde832a81128ee041ef8c29313c5827d0ab2fbd fixed the above issue for process_funs but not for one-off use of dplyr functions.

jeanpaulrsoucy commented 2 years ago

Values that take the maximum value of two datasets (i.e., provincial and federal vaccine distribution numbers) cause an error in the script when one of the values is not available. This should be rectified by writing an explicit function with error tolerance.

jeanpaulrsoucy commented 2 years ago

The sheets_merge function and other functions pulling/pushing to Google Sheets should have error-tolerance added, since these are a common point of failure.