contentTypes - Githubissues

legouee commented 3 years ago

JSON-LD files.

lzehl commented 3 years ago

@legouee thanks a lot for this PR on contentTypes. If we clarify the comments I had this will be a great first selection for the KG :)

apdavison commented 3 years ago

I don't see any value to the category field, I suggest we remove it.

At the same time, we could perhaps add a "documentation" or "specification" field which contains the IRI of a document which contains the specification of the format (where available) or some additional context/documentation (where there is no public spec).

lzehl commented 3 years ago

@apdavison the only value I could see is that we actually define possible categories in the controlledTerms, because with each category a certain set of assumption can already be made on the usage of the file with that contentType. Nonetheless, since we predefine the contentTypes here anyway, I think this might be over the top. So removing it is completely fine with me.

I agree on the "documentation" or "specification" field. Maybe two fields (both optional):

"description" (string, free text) for providing short additional information if there is no public spec
"specification" (string, format: iri) with link to public spec

Would that work?

apdavison commented 3 years ago

sounds good to me.

lzehl commented 3 years ago

@apdavison great! I'm going to adopt the contentType schema according to the discussion above to:

{
  "_type": "https://openminds.ebrains.eu/core/ContentType",
  "required": [
    "associatedFileExtension",
    "name"  
  ],
  "properties": {
    "associatedFileExtension": {
      "type": "array",
      "minItems": 1,
      "uniqueItems": true,
      "_instruction": "Enter one or several file extensions associated with this content type.",
      "items": {
        "type": "string"
      }
    },
    "description": {
      "type": "string",
      "_instruction": "Enter a description of the content type specification if no public specification file can be linked. Leave blank an d use 'specification' if a public specification can be linked."
    },
    "relatedMediaType":{
      "type": "string",
      "format": "iri",
      "_instruction": "Enter the iternationalized resource identifier (IRI) to a registered media type (e.g. on IANA.org) matching this content type."
    },
    "name": {
      "type": "string",
      "_instruction": "Enter the name (iana-inspired convention) of this content type."
    },
    "specification":{
      "type": "string",
      "format": "iri",
      "_instruction": "Enter the iternationalized resource identifier (IRI) to the official specification of this content type. Leave blank and use 'description' to provide some specification if an official specification is not available."
    },
    "synonym":{
      "type": "array",
      "minItems": 1,
      "uniqueItems": true,
      "_instruction": "Enter one or several synonyms of this content type.",
      "items": {
        "type": "string"
      }
    }
  }
}

@apdavison & @legouee let me know if you would change the instructions. @legouee Could you adopt the JSON-LDs of this PR?

legouee commented 3 years ago

Yes, I will modify the JSON-LDs according to your suggestions.

legouee commented 3 years ago

I removed the category field and the empty strings.

apdavison commented 3 years ago

I don't think we should take the time to add "specification/description" fields in this PR, but this could be done in separate PRs. I also don't think we necessarily need specification links or descriptions for all content-types, just for the non-standard ones, and especially for formats developed within EBRAINS.

@olinux - could you comment on @lzehl's questions about allowed characters in the @id field?

lzehl commented 3 years ago

@apdavison I agree that we do not need to have specifications and descriptions within this PR. The corresponding schema is updated, so that the JSON-LDs could have them, but they are optional fields so they do not have to be provided now (or ever) if there is no need to.

olinux commented 3 years ago

@apdavison and @lzehl :

The ids just have to be a valid IRI - so all above suggestions are allowed and from a technical point of view all that matters is that they are unique. Due to the format of the content types, I would also recommend to keep them in the structure of

"@id": "https://openminds.ebrains.eu/instances/contentTypes/application/vnd.openxmlformats-officedocument.wordprocessingml.document" meaning after "contentTypes" having one level of grouping (as in the media types) separated by a slash and the real format. This way we're sure everyone does it the same way and it is more predictable (which also helps e.g.) when preparing the JSON-LD payloads linking to the instance.

lzehl commented 3 years ago

@legouee & @apdavison sorry I lost a bit track of this PR. I think also my open questions are solved by the comment from @olinux So I will merge it now. If I missed a correction we can also do it later on (like for example adding descriptions and specifications where we see fit).

@legouee thanks again a lot for preparing this!

HumanBrainProject / openMINDS_instances

contentTypes #2