TheELNConsortium / TheELNFileFormat

Specification for the ELN File Format
MIT License
41 stars 7 forks source link

Draft: Add SampleDB example #8

Closed FlorianRhiem closed 2 years ago

FlorianRhiem commented 2 years ago

Here's a work-in-progress example of an .eln file generated by SampleDB. Feedback on the structure and possible improvements would be welcome.

NicolasCARPi commented 2 years ago

I have an issue with this one: 2022-06-21-003403_403x160_scrot

If it's remote the @id should be an URL starting with http. See: https://www.researchobject.org/ro-crate/1.1/data-entities.html#web-based-data-entities

FlorianRhiem commented 2 years ago

The motivation there was not to describe the remote resource, but the link that was created to it by a user. Multiple different users might at different times enter links for different objects with the same URL, but a different (optional) description and title. One single File entry for the URL is not enough to reflect that, so this "virtual" file entry was used as a representation of that link.

The RO Crate section on web-based data entities assumes that these are available on the web via HTTP. While this might be true for some of the links stored for an object in SampleDB, this is not the case for all of them, e.g. a file, sftp oder smb link can be useful to document the location/URL of the linked content without it being available. For http or https scheme links, availability is not checked at any time either.

Should these links still be exported and made available in the .eln file, or should they be left out as something that the RO Crate isn't able to reproduce?

As RO Crate File objects are MediaObject objects, I suppose one way to represent the link nature of it would be to replace the url property with the contentUrl property?

NicolasCARPi commented 2 years ago

The issue here is during import, this entry is referenced in an hasPart section and as being a File, but it is not a File. So I think the @id should really be the http link instead of a relative path that doesn't exist. Note that there is no issue referencing a samba share or something else than http (https://www.iana.org/assignments/uri-schemes/prov/smb), but we need a way to know if it's a local file in the archive or not.

The import code could also simply check for file existence and skip if it's not there...

FlorianRhiem commented 2 years ago

I've removed these non-file File entries and moved them, together with information on the files, to an extra per-object files.json. I've also added a comments.json to contain comments left for the object. Now the @graph only contains the ro-crate-metadata.json CreativeWork, the root Dataset for ./, Dataset entries for individual objects (experiments, samples, etc), Person entries for users, and File entries for actual files in the zip, with everything being referred via hasPart or author in a graph starting with the root Dataset.

SteffenBrinckmann commented 2 years ago

In a related note on remote content: I assume we only save the location of the content smb://... but not never any user credential information. And then we hope that the receiver of the ro-crate has also access to the content. Is that correct?

FlorianRhiem commented 2 years ago

Yes, that's probably best.

SteffenBrinckmann commented 2 years ago

Can we resolve this PR? We accept the PR as "work in progress", just as all the other ELN-software solutions are "work in progress" In the future, @FlorianRhiem can just push his example into the example folder (just as the other partners)

NicolasCARPi commented 2 years ago

In the future, @FlorianRhiem can just push his example into the example folder (just as the other partners)

I believe it's best to make a PR for visibility of the change, and also we can discuss on it better. But I agree that we can merge WIP stuff, no troubles here. (and small changes can be pushed directly)

NicolasCARPi commented 2 years ago

@FlorianRhiem I think for the comments we should use https://schema.org/Comment. And your author_id would become a Person instead in the author property. eLabFTW also has comments, and I'll work on adding it too (current examples don't have comments EDIT: done). The comments are a very good example of a common property that is standardized :) The idea is to minimize the information dumped in random json files and add as much as possible in metadata json file.

A pic of how it currently looks after an import in elab: 2022-07-07-015909_1744x901_scrot

SteffenBrinckmann commented 2 years ago

Until now we have not discussed any field-naming-conventions. Since you guys now started talking 'comment', and 'person', .... I move this discussion to discussion, since it does not only relate to this PR but to the eln in general.