ccodwg / FAIRCovid19DataProject

A repository to organize the FAIR COVID-19 Data for 🇨🇦 project. Led by the COVID-19 Canada Open Data Working Group and supported by CANMOD.
https://whathappened.coronavirus.icu/
0 stars 0 forks source link

SUSTAINABILITY: Automated data collection for the Canadian COVID-19 Data Archive #2

Open jeanpaulrsoucy opened 2 years ago

jeanpaulrsoucy commented 2 years ago

Presently, data collection for the Canadian COVID-19 Data Archive (Covid19CanadaArchive) is managed through a combination of Python scripts, the self-developed Python package archivist and a series of GitHub actions run by Covid19CanadaBot. A basic flowchart of the current process (taken from the aforementioned repository) may be seen below.

Flowchart illustrating the update process for Covid19CanadaArchive

At present, manual intervention is occasionally required to ensure data preservation, such as when a dataset and/or website fails to load correctly (this is particularly common for website that rely heavily on JavaScript).

Main areas of improvement

These are the main areas of improvement I see for improving sustainability of the Archive data collection process:

Development of the archivist packages

A few specific ideas for the development of the archivist package: