International-Data-Spaces-Association / InformationModel

The Information Model of the International Data Spaces implements the IDS reference architecture as an extensible, machine readable and technology independent data model.
Apache License 2.0
64 stars 38 forks source link

Extend supported MediaTypes #224

Closed JohannesLipp closed 4 years ago

JohannesLipp commented 4 years ago

In MediaType.ttl we currently list media types such as TEXT_PLAIN, TEXT_XML, APPLICATION_MSWORD and so on.

Please extend the supported media types by Excel etc., according to a standard or guideline, thanks!

HaydarAk commented 4 years ago

Did some research.

What we currently have

ids:MediaType class as a subclass of dcterms MediaType and ids:IANAMediaType as well as ids:CustomMediaType as subclasses of ids:MediaType. The latter have instances. IANA media type instances or custom types which are not part of IANA.

The list of instances is incomplete / short, as Johannes hinted here

Findings

DCAT2 also recommends to use IANA Media Types. See distribution_media_type and distribution_format

The IANA has a list of media types (see here), within 10 categories:

with 1500+ media types. ~1400 belong to the application category. The list include many well known media types (JSON, turtle, XML, MP4, ...) as well as less known types.

Suggestion

The IANA list is updated regularly. Therefore we would have to update our media type instances regularly too.

In my opinion, thats not a good idea. Things like languages (ISO 639) do not get many updates, but IANA types surely do. I'd suggest to use a top-down approach here: Instead of specifying all media types ourselves and maintaining a list of 1500+ instances, we should leave the filling of media types to the services. We can also keep the handful of IANA mediatype instances we currently have as examples.

JohannesLipp commented 4 years ago

Thank you for the investigation, especially on IANA media types. I fully agree that we should not maintain a list of all IANA media types, following frequent updates.

The following example you provided is a convenient way to directly use IANA media types (following the IANA media types list in an IDS context:

{
"@context": "https://w3id.org/idsa/contexts/2.1.0/context.jsonld",
"@type": "Representation",
"mediaType": {
                "@type": "ids:IANAMediaType",
                "@id": "ftp://this_is_some_code"
            },
"@id": "https://connector.fit.fraunhofer.de"
}
JohannesLipp commented 4 years ago

@HaydarAk reopening this for a further question. Is it a good practice to use IANA media types, which are not in the list of ids:IANAMediaTypes like so?

my_namespace:APPLICATION_EXCEL
  a ids:IANAMediaType ;
  rdfs:label "application/vnd.ms-excel" ;
  rdfs:comment "This Media Type/OID is used to identify Microsoft Excel generically"@en ;
  rdfs:isDefinedBy <https://www.iana.org/assignments/media-types/application/vnd.ms-excel> ;
  ids:filenameExtension "xlsx" ;
.
HaydarAk commented 4 years ago

From a syntactic point of view, it should work.

One thing I noticed with this is the difference between IANA types and what we write in RDF/TTL or JSON-LD. We can discuss this in a larger group. Using your example above, the problem occurs when we take a look at

my_namespace:APPLICATION_EXCEL
  a ids:IANAMediaType ;
  rdfs:label "application/vnd.ms-excel" ;

the rdfs:label is equal to what IANA lists for excel documents. But the actual rdf statements my_namespace:APPLICATION_EXCEL in the first line is not equal to the IANA. Therefore it is difficult to match the actual media types.

We should either switch to a more simple appraoch in our modeling or think about how we ensure that IANA types are equal. We do not do any concrete type checking, which I dont see as part of the Information Model. But we should (at least) define, what the prefered way of modeling this should be like.

my suggestion: a) use IRIs

<https://www.iana.org/assignments/media-types/application/xml>
  a ids:IANAMediaType ;

b) use a datatype property with range xsd:string, where we can write things like "application/xml"

I prefer a)

JohannesLipp commented 4 years ago

Option a) seems fair to me. This would also improve my example above, because the Excel IANA media type is no longer defined in each custom namespace.

What steps are to be executed to solve this?

JohannesLipp commented 4 years ago

Our agreed solution is to follow these DCAT-2 examples and use dcat:mediaType <http://www.iana.org/assignments/media-types/text/csv> ; via ids:mediaType, respectively.