TheELNConsortium / TheELNFileFormat

Specification for the ELN File Format
MIT License
41 stars 7 forks source link

Description formats #74

Open nicobrandt opened 3 weeks ago

nicobrandt commented 3 weeks ago

Have we discussed yet on how to deal with different description formats? Specifically, I'm talking about the text property we use within datasets, files, comments, etc. In the current examples we have, one can find at least three different formats: HTML, Markdown and plain text. Generally, ELNs could either leave them as-is and hope for the best, or attempt to convert the text in a suitable format. However, such conversions can be error-prone, especially when the ELN would first need to detect the source format based on the contents.

A related question, at least for Markdown, HTML, etc., is also how to deal with linked, ELN-internal images. Even if these would be included in the Crate, the corresponding URL would need to be updated somehow in the text.

NicolasCARPi commented 3 weeks ago

In eLab, text will be HTML for most cases (default setting). But it can also be Markdown. I suggest using encodingFormat on the Dataset (text/html, text/markdown).

2024-06-12-150556_667x62_scrot

NicolasCARPi commented 3 weeks ago

Even if these would be included in the Crate, the corresponding URL would need to be updated somehow in the text.

In my case, any attached file has it's old name as alternateName, it corresponds to the name of the file on the storage backend, which allows one to look for this value (some hash) in the main text and replace it with the new one, of the freshly uploaded file. Examples .eln should ideally include such a use case, with an image embedded in main text.

nicobrandt commented 3 weeks ago

In eLab, text will be HTML for most cases (default setting). But it can also be Markdown. I suggest using encodingFormat on the Dataset (text/html, text/markdown).

2024-06-12-150556_667x62_scrot

Including a MIME type somewhere would be ideal. For datasets and comments, this should work, as long as ELNs know to treat this property accordingly. Files may have both a MIME type (describing the actual file content) and a description though, so I can see it being an issue there at least.

NicolasCARPi commented 3 weeks ago

Including a MIME type somewhere would be ideal.

But encodingFormat is a MIME type!

Files may have both a MIME type and a description

How is the description clashing with encodingFormat??

nicobrandt commented 3 weeks ago

What I meant is, if a file has a description specified as text and a MIME type (describing the file contents) specified as encodingFormat, then how would we specify the MIME type of just the description?

NicolasCARPi commented 3 weeks ago

Oh I see. Like this then:

 "description": {
    "@type": "CreativeWork",
    "text": "<p>This is a detailed description in HTML format.</p>",
    "encodingFormat": "text/html"
  },
nicobrandt commented 3 weeks ago

This syntax would be nice, but unfortunately only text seems to be allowed as value for the text property. Using description like in your example would actually allow us to use https://schema.org/TextObject though, which could look very similar to your example, just with a different @type. So we could potentially rethink the use of text and description within the different entities.

SteffenBrinckmann commented 3 weeks ago

Can you remind me, why we use "text" at all if there is description? I forgot.

I am thinking to always convert upon eln-import the content into the encoding that pasta uses (md). As Nico said, it is error-prone but it makes the code easier. I might change this point-of-view in the future.

nicobrandt commented 3 weeks ago

We once discussed (somewhere) to use description for describing the node itself and text for the actual description "content". Currently, in Kadi4Mat we only use the former to add some meta descriptions about additional export files that the user may choose to include in the Crate. But these are not really necessary, so switching from text to description everywhere would not be a problem for us.