Closed Varnita-Metastring closed 4 years ago
Blocked by #3
At the moment, I think that the following kind of mongo schema can hold the data collection.
{
dataset: <oid of dataset in dataset collection>,
entity: <oid of entity in entity collection>,
indicator: <oid of indicator in indicator collection>,
value: <value of the indicator for this entity in this dataset>
}
The three collections that this will depend on are:
{
"name": <Name of the dataset>,
"year": <year in which it was released>,
"sourceUrl": <most accurate url to this dataset online>,
"metadata URL": <URL that talks about this dataset>,
"parent": <oid of parent dataset (if any)>
}
A more complicated metadata schema is used by worldbank microdata repository and we can learn from it as we go.
See sample: https://microdata.worldbank.org/index.php/metadata/export/2949/json
{
name: "Canonical name of entity",
alternative_names: "array of alternative names" (optional),
belongs_to: <array of oid of entities this entity "belongs" to in some way (geographical hierarchy)>
contains: <array of oid of entities that "belongs" to this entity>,
shape: <geojson of this entity>
}
This holds information about indicator
{
name: <Name of the indicator>,
(can have further fields to be able to group indicators together)
}
From internal discussions, this idea has been dropped in favour of storing datasets as tables (that resemble original spreadsheet files). The additional complexities that introduce in querying will be solved (presumably) by using ElasticSearch.
Shall document that as it evolves into a mature schema.
Closing this in favour of #13
Design and build database schema to upload health data with a flexible column annotations for framework and metadata