BioSchemas / specifications

Issue tracker, technical wiki, and example markup
https://bioschemas.org
54 stars 52 forks source link

Dataset: change cardinality of distribution from one to many #574

Closed cmungall closed 2 years ago

cmungall commented 2 years ago

On https://bioschemas.org/profiles/Dataset/0.5-DRAFT

I see

Property Expected Type Description CD Controlled Vocabulary Example
distribution DataDownload Schema:A downloadable form of this dataset, at a specific location, in a specific format. ONE    

But cardinality of 1 is not consistent with DCAT3, e.g.

https://www.w3.org/TR/vocab-dcat-3/#ex-when-using-distribution

AlasdairGray commented 2 years ago

I would agree that I would expect the cardinality to be MANY to be consistent with DCAT.

Looking at the example embedded in the page shows more than one distribution :/

{
"@type": "Dataset",
"distribution": [
    {
      "@type": "DataDownload",
      "name": "UniParc XML",
      "fileFormat": "xml",
      "contentURL": "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/uniparc/uniparc_all.xml.gz"
    },
    {
      "@type": "DataDownload",
      "name": "UniParc FASTA",
      "fileFormat": "fasta",
      "contentURL": "ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/uniparc/uniparc_active.fasta.gz"
    }
]
}
egonw commented 2 years ago

Isn't the schema.org Dataset type (which has a property encodingFormat equivalent to the DCAT Distribution?

AlasdairGray commented 2 years ago

I believe that the mapping is

dcat:Dataset schema:sameAs schema:Dataset
dcat:Distribution schema:sameAs schema:DataDownload

encodingFormat is inherited from schema:CreativeWork where it is used to give the file format.

cmungall commented 2 years ago

It would be great to make these mappings explicit, either as SSSOM files or as skos annotations directly in the bioschemas source rdf.

Has anyone engaged the frictionless people? This is used in a lot of projects and it would be good to have a canonical mapping of DataPackage and DataResource; I am using dcat:Dataset and dcat:Distribution for this for now

AlasdairGray commented 2 years ago

Schema.org do make it explicit in their human readable page; the following is viewable when clicking on the 'more' link Screenshot 2022-06-01 at 11 49 20

And they also have this statement at the bottom of the page

Acknowledgements This class is based upon W3C DCAT work, and benefits from collaboration around the DCAT, ADMS and VoID vocabularies. > See http://www.w3.org/wiki/WebSchemas/Datasets for full details and mappings.

It is also given in the RDF versions of the schema, e.g. line 1190 of the turtle representation of version 14. I appreciate that you need to know where this is and go digging for it.

AlasdairGray commented 2 years ago

Exposing equivalence mappings is something that we as a community need to get better at. Our types and properties are often related to existing ones.