Open mikepizzo opened 8 years ago
I've toyed with this idea for some time, here's what I came up with so far.
This is based on the W3C Data Catalog Vocabulary (DCAT), including a subset of the additional properties defined by DCAT Application Profile for data portals in Europe.
Construction principles:
The correspondence between a dcat:Dataset and an OData service is in sync with the use of "data set" as a "collection of closely related tables", see https://en.wikipedia.org/wiki/Data_set, and the use of DataSet in the .NET framework where "A DataSet represents a complete set of data including the tables that contain, order, and constrain the data, as well as the relationships between the tables.", see https://msdn.microsoft.com/en-us/library/ss7fbaez(v=vs.110).aspx.
<Term Name="Catalog" Type="Catalog.CatalogType" AppliesTo="EntitySet">
<Annotation Term="Core.Description" String="A data catalog is a curated collection of metadata about datasets." />
</Term>
<ComplexType Name="CatalogType">
<Property Name="title" Type="Edm.String">
<Annotation Term="Core.Description" String="A name given to the catalog." />
</Property>
<Property Name="description" Type="Edm.String">
<Annotation Term="Core.Description" String="A free-text account of the catalog." />
</Property>
<Property Name="modified" Type="Edm.Date">
<Annotation Term="Core.Description" String="Most recent date on which the catalog was changed, updated or modified." />
</Property>
<Property Name="homepage" Type="Edm.Date">
<Annotation Term="Core.Description" String="The homepage of the catalog." />
</Property>
<Property Name="dataset" Type="Collection(Catalog.Dataset)">
<Annotation Term="Core.Description" String="A dataset that is part of the catalog." />
</Property>
</ComplexType>
<Term Name="Dataset" Type="Catalog.DatasetType" AppliesTo="EntitySet">
<Annotation Term="Core.Description"
String="A collection of data, published or curated by a single agent, and available for access or download in one or more formats." />
</Term>
<ComplexType Name="DatasetType">
<Property Name="identifier" Type="Edm.String">
<Annotation Term="Core.Description" String="A unique identifier of the dataset." />
</Property>
<Property Name="title" Type="Edm.String">
<Annotation Term="Core.Description" String="A name given to the dataset." />
</Property>
<Property Name="description" Type="Edm.String">
<Annotation Term="Core.Description" String="A free-text account of the dataset." />
</Property>
<Property Name="modified" Type="Edm.Date">
<Annotation Term="Core.Description" String="Most recent date on which the catalog was changed, updated or modified." />
</Property>
<Property Name="publisher" Type="Edm.String">
<Annotation Term="Core.Description" String="An entity responsible for making the dataset available." />
</Property>
<Property Name="keyword" Type="Collection(Edm.String)">
<Annotation Term="Core.Description" String="A keyword or tag describing the dataset." />
</Property>
<Property Name="distribution" Type="Catalog.Distribution">
<Annotation Term="Core.Description" String="Connects a dataset to its available distributions." />
</Property>
<!-- properties from DCAT-AP -->
<Property Name="conformsTo" Type="Edm.String">
<Annotation Term="Core.Description" String="An implementing rule or other specification." />
</Property>
<Property Name="accessRights" Type="Edm.String">
<Annotation Term="Core.Description" String="Indicates whether the Dataset is open data, has access restrictions or is not public." />
<!-- TODO: annotation allowedValues 'public', 'restricted', and 'non-public' -->
</Property>
<Property Name="versionInfo" Type="Edm.PrimitiveType">
<Annotation Term="Core.Description" String="A version number or other version designation of the Dataset." />
</Property>
<!-- candidates from DCAT-AP: hasVersion, isVersionOf -->
</ComplexType>
<Term Name="Distribution" Type="Catalog.DistributionType" AppliesTo="EntitySet">
<Annotation Term="Core.Description">
<String>Represents a specific available form of a dataset.
Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints.
Examples of distributions include a downloadable CSV file, an API or an RSS feed.</String>
</Annotation>
</Term>
<ComplexType Name="DistributionType">
<Property Name="title" Type="Edm.String">
<Annotation Term="Core.Description" String="A name given to the dataset." />
</Property>
<Property Name="description" Type="Edm.String">
<Annotation Term="Core.Description" String="A free-text account of the dataset." />
</Property>
<Property Name="modified" Type="Edm.Date">
<Annotation Term="Core.Description" String="Most recent date on which the catalog was changed, updated or modified." />
</Property>
<Property Name="accessURL" Type="Edm.String">
<Annotation Term="Core.Description"
String="A landing page, feed, SPARQL endpoint or other type of resource that gives access to the distribution of the dataset." />
</Property>
</ComplexType>
Open Data datasets are described using common terms, such as license, publisher, creation date, and update frequency.
In the open data community, DCAT (http://www.w3.org/TR/vocab-dcat/) defines common terms for this cataloging information, pulling also from Dublin Core (http://dublincore.org/documents/dcmi-terms/).
We should define OData vocabularies to allow marking up an OData service with the same terms for general dataset cataloging.