SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
76 stars 24 forks source link

HVD C4. Bulk download #254

Closed bertvannuffelen closed 9 months ago

bertvannuffelen commented 1 year ago

The HVD implementing regulation requires to provide bulk downloads for datasets. Bulk download is a functionality that allows to get the whole dataset as a copy on your local computer.

proposal

As in DCAT-AP guidelines is specified, a downloadable file is to be represented with a Distribution.

sirex commented 1 year ago

As I understand Bulk download requirement mostly applies to Data Services, because Data Services might provide row by row access to data, but not Bulk download.

If we have:

ex:dataset1 a dcat:Dataset ;
  dcat:distribution ex:dist1 .

ex:service1 a dcat:DataService ;
  dcat:servesDataset ex:dataset1 .

ex:dist1 a dcat:Distribution ;
  dcat:accessService ex:service1 .

and ex:service1 does not support Bluk download. Then how we can tell, that ex:dist1 has a single downloadable file?

jakubklimek commented 1 year ago

@sirex I view the requirement basically as having both a dataservice and a downloadable file as distributions of an HVD dataset:

ex:dataset1 a dcat:Dataset ;
  dcat:distribution ex:dist1 , ex:dist2 .

ex:dist1 a dcat:Distribution ;
  dcat:accessService ex:service1 .

ex:service1 a dcat:DataService ;
  dcat:servesDataset ex:dataset1 .

ex:dist2 a dcat:Distribution ;
  dcat:accessURL <file> ;
  dcat:downloadURL <file> .
sirex commented 1 year ago

So basically for HVD dcat:downloadURL becomes «mandatory»? Now it is «optional».

jakubklimek commented 1 year ago

Well, it is optional to support distributions which are not directly downloadable, e.g., according to DGA. But for directly downloadable files, it should be used. And therefore, for HVD dataset's distributions in the form of bulk download, this should be mandatory.

bertvannuffelen commented 12 months ago

The feedback from @jakubklimek is correct.

In case the dataset is subject to the HVD IR, and the HVD IR mandates the presence of a bulk download (almost always the case), then the downloadURL is mandatory.