Geonovum / DCAT-AP-NL30

dcat3-ap-nl
2 stars 2 forks source link

Inhoudelijk: toevoegen van nieuwe databestandsformaten aan #114

Closed dkapitan closed 1 month ago

dkapitan commented 2 months ago

Betreft volgende paragrafen / klassen

Omschrijving

Op bovenstaande plaatsen wordt verwezen naar de EU lijst van bestandsformaten. Ik constateer dat de meest moderne, efficiente bestandstandformaten ontbreken, te weten:

Prioriteit

Must-have

Voorgestelde aanpassing

Bovengenoemde formaten toevoegen aan de waardelijst

idevisser commented 2 months ago

verzoek bij EU publication office ingediend

idevisser commented 2 months ago

@dkapitan ik heb verzoek ingediend, kun jij gevraagde info geven? zie hieronder:

Thank you for your proposal to add new file type concepts. We would be happy to add them in our next release, if you could provide some more information.

You suggest to add Apache Parquet, Apache AVRO and Apache ORC.

In the File type authority table, a mandatory property for each concept is the IANA media type and preferably also a file extension. It is possible to have several media types.

In the IANA currently there are these six media types listed for Apache:

application/vnd.apache.parquet, file extension: .parquet application/vnd.apache.thrift.binary, file extension: n/a application/vnd.apache.thrift.compact, file extension: n/a application/vnd.apache.thrift.json, file extension: n/a application/vnd.apache.arrow.file, file extension: .arrow application/vnd.apache.arrow.stream, file extension: .arrows

So, for the Apache Parquet we have the necessary information, but not for other two. We had a look at the specifications, but it was not clear.

Do you know what file format are AVRO and ORC files based on – text, binary, json, xml? We can use one or more generic media types, if there is no specific one, but at least one media type is necessary. Perhaps there is a generic media type for columnar database formats?

Looking forward to you feedback!

dkapitan commented 2 months ago

@idevisser

Thanks for following up. I have done some research, and here are my findings.

Avro

Unfortunately Avro still hasn't been officially approved. See https://issues.apache.org/jira/plugins/servlet/mobile#issue/AVRO-488

So for now I propose to disregard Avro until this has been resolved.

ORC

For ORC the status is even more vague, I can't find any reference. So I suggest to leave this one as well.

Final comments

I suggest we include all the apache mimetypes that you have found. That resolves this issue as far as I am concerned.

idevisser commented 2 months ago

ik heb je feedback doorgegeven aan het EU publication office

idevisser commented 1 month ago

Er is verder geen aanpassing nodig in dit profiel.