Open awm33 opened 4 years ago
Proposed schema for tracking datasets / metadata. Top level are tables/objects, second level are fields, third level are notes or enum values. If something has a ?, it means uncertainty and looking for feedback.
Regarding the record date: are we treating datasets as if they are immutable? If not it'd be desirable to allow for a recordDates
property which accepts a list of dates instead.
Regarding The Enum typed Field type: When recording enums like this, a data structure to define the possible enum values seems to be desirable. Otherwise an enum value does not differ from a plain text value.
@d1gits Postal Code is a type of Geographic Region. Ie. if a data set is tied to a postal code, you'd add a Geographic Region of type Postal Code, with a parent which can be of type City, Administrative Region, State or Country.
I'd like to suggest adding a Geographic Regioin of type Global or World as well, which can be useful to tie global reports to.
@nikovanmeurs No, hence the Update Frequency.
An enum would be a string, but to distinguish it from free text, as in there are a set number of values. Even if we just use a string type, we should list the possible values in the description.
@awm33 clear on both points. Let's add the enum type in there to emphasize the fact that it's value should be one of a predefined set and keep the Record Date as is.
@nikovanmeurs Added "Possible Values - Array of Text (only for Enum type)" so that list can be structured.
I think I need clarity on what "Record Date" means?
@awm33 I was referring to this field in @d1gits's proposal:
I get the impression that the two fields you proposed on Slack (Publication Date and Scrape Date) are a better fit here though.
It might be desirable to store relations between datasets as well. Say we produce a dataset of our own which is based on 3 other datasets, the organisation would be COVID-19-Data, however for other people to determine the reliability of this dataset we should be able to point them to the source data sets.
@nikovanmeurs That's already covered in Tables
Input Source(s) - Links to input Sources - Required
Input Table(s) - Links to Tables (if data is derived from other tables / non-raw sources)
Could even build a lineage graph using those fields
Check. Missed that one 👍
Proposed schema for tracking datasets / metadata. Top level are tables/objects, second level are fields, third level are notes or enum values. If something has a ?, it means uncertainty and looking for feedback.