COVID-19-Data / Strapi

1 stars 1 forks source link

Metadata Schema #1

Open awm33 opened 4 years ago

awm33 commented 4 years ago

Proposed schema for tracking datasets / metadata. Top level are tables/objects, second level are fields, third level are notes or enum values. If something has a ?, it means uncertainty and looking for feedback.

d1gits commented 4 years ago

Proposed schema for tracking datasets / metadata. Top level are tables/objects, second level are fields, third level are notes or enum values. If something has a ?, it means uncertainty and looking for feedback.

nikovanmeurs commented 4 years ago

Regarding the record date: are we treating datasets as if they are immutable? If not it'd be desirable to allow for a recordDates property which accepts a list of dates instead.

Regarding The Enum typed Field type: When recording enums like this, a data structure to define the possible enum values seems to be desirable. Otherwise an enum value does not differ from a plain text value.

@d1gits Postal Code is a type of Geographic Region. Ie. if a data set is tied to a postal code, you'd add a Geographic Region of type Postal Code, with a parent which can be of type City, Administrative Region, State or Country.

I'd like to suggest adding a Geographic Regioin of type Global or World as well, which can be useful to tie global reports to.

awm33 commented 4 years ago

@nikovanmeurs No, hence the Update Frequency.

An enum would be a string, but to distinguish it from free text, as in there are a set number of values. Even if we just use a string type, we should list the possible values in the description.

nikovanmeurs commented 4 years ago

@awm33 clear on both points. Let's add the enum type in there to emphasize the fact that it's value should be one of a predefined set and keep the Record Date as is.

awm33 commented 4 years ago

@nikovanmeurs Added "Possible Values - Array of Text (only for Enum type)" so that list can be structured.

I think I need clarity on what "Record Date" means?

nikovanmeurs commented 4 years ago

@awm33 I was referring to this field in @d1gits's proposal:

I get the impression that the two fields you proposed on Slack (Publication Date and Scrape Date) are a better fit here though.

nikovanmeurs commented 4 years ago

It might be desirable to store relations between datasets as well. Say we produce a dataset of our own which is based on 3 other datasets, the organisation would be COVID-19-Data, however for other people to determine the reliability of this dataset we should be able to point them to the source data sets.

awm33 commented 4 years ago

@nikovanmeurs That's already covered in Tables

Input Source(s) - Links to input Sources - Required
Input Table(s) - Links to Tables (if data is derived from other tables / non-raw sources)
awm33 commented 4 years ago

Could even build a lineage graph using those fields

nikovanmeurs commented 4 years ago

Check. Missed that one 👍