globaldothealth / list

Repository for Global.health: a data science initiative to enable rapid sharing of trusted and open public health data to advance the response to infectious diseases.
MIT License
39 stars 7 forks source link

Implement reusable data service #2714

Open iamleeg opened 2 years ago

iamleeg commented 2 years ago

See Moritz's day zero schema on https://github.com/globaldothealth/monkeypox/issues/21#issuecomment-1143321387

iamleeg commented 2 years ago

Moritz and the curation team are reviewing this schema. Based on discussion including @abhidg and leads/curators yesterday, I think we want to treat the covid-19 collection and schema as its own thing and start from scratch with the day-zero schema (maybe just renaming the database from covid19 to $newdisease) in new instances, then configure them to support additional fields as the pandemic develops.

iamleeg commented 2 years ago

Here's the plan @abhidg and I discussed yesterday.

#2714 architecture

So the curator API has a configurable connection to a data service (which it already does), and rather than having a strict OpenAPI validator for cases accepts any object as a case and lets the data service validate its input and report errors: the curator API is responsible for user management, data source/ingestion management and access control.

Existing UI has a lot of connection to new schema, but we can re-use automated source configuration, user profiles/management, overall structure so add new case list/forms for the new flexible schema and choose which to use per deployment.

Geocoding API remains unchanged.

The data API is completely replaced: the existing one has multiple constraints coupling it to the COVID-19 schema, so we'll have a new service for new diseases which encodes the day 0 schema plus custom fields. The covid-19 instances of G.h use the existing data service, and hMPXV/emerging outbreak instances use the new one instead. New data service can be based on Python, and should not use a schema-defining library like Mongoose so that validation can be done in code to take into account custom fields defined by curators. It should use standard representations of all fields where possible, e.g. ISO country codes, ISO date formats, GeoJSON for locations. OpenAPI documentation should be derived from the code and not a separate, redundant statement of the data schema. Consider whether it is possible/desirable to switch database to postgres or appropriate to continue using mongoDB.

jim-sheldon commented 2 years ago

Work on the curator UI captured in this issue: https://github.com/globaldothealth/list/issues/2902