epiforecasts / covidregionaldata

An interface to subnational and national level COVID-19 data. For all countries supported, this includes a daily time-series of cases. Wherever available we also provide data on deaths, hospitalisations, and tests. National level data is also supported using a range of data sources as well as linelist data and links to intervention data sets.
https://epiforecasts.io/covidregionaldata/
Other
37 stars 18 forks source link

Calling `get_available_datasets` initialises a copy of every class object - this may cause unexpected problems #372

Closed seabbs closed 3 years ago

seabbs commented 3 years ago

When you call get_regional_data for any country a side effect of finding out which datasets are available is to try to instantiate a class for every country and this will fail for classes which rely on the data loaded as part of the package. We may need to lighten the instantiation which the check of available datasets provides or make a more specific function which just checks that one particular dataset is available - and then figure out what to do about the dependency on the data if the full library hasn’t been loaded.

Originally posted by @RichardMN in https://github.com/epiforecasts/covidregionaldata/issues/369#issuecomment-852418246

joseph-palmer commented 3 years ago

The method used to check if a country is supported (initialise_dataclass) filters and checks the data returned by get_available_datasets() so calls get_available_datasets() which initalises all classes. Two methods initally come to mind to avoid this here:

1) Use a try catch inside initialise_dataclass() to try and initalise the class given and then fail with the current warnings if it doesn't. 2) Have the data returned by get_avaliable_datasets() as package data, like the region codes for some countries and have this update on load or perhapse have the github action which automatically makes the yaml workflows run get_avaliable_datasets()?

Option 2 would allow us to replace expensive calles to this function with an easier one to a saved dataset in other areas.

Happy to try and impliment either, or some other method.