django-daiquiri / daiquiri

A framework for the publication of scientific databases
https://escience.aip.de/daiquiri
Apache License 2.0
26 stars 8 forks source link

IMPROVEMENT: add detached-header datalink semantic to the oai adapter #176

Closed kimakan closed 1 year ago

kimakan commented 1 year ago

Add the #detached-header semantic to the oai adapter so it can be digested by the oai metadata. Ultimately, it's the same implementation as for the #documentation but it allows to use datalink with a wider range of semantics without being confined by the oai adapter capabilities. https://www.ivoa.net/rdf/datalink/core/2022-01-27/datalink.html#detached-header https://github.com/django-daiquiri/daiquiri/blob/f6f352ac46c877209bb176d391f97079326e01f5/daiquiri/oai/adapter.py#L243-L276

As an example for the added code

            elif semantics == '#detached-header':
                datalink['alternate_identifiers'].append({
                    'alternate_identifier': access_url,
                    'alternate_identifier_type': 'URL'
                })
kimakan commented 1 year ago

The alternative solution would be putting the #detached-header into the relatedIdentifiers instead

            elif semantics == '#detached-header':
                datalink['formats'].append(content_type)
                datalink['related_identifiers'].append({
                    'related_identifier': access_url,
                    'related_identifier_type': 'URL',
                    'relation_type': 'IsSupplementedBy'
                })
agy-why commented 1 year ago

@kimakan That is a good question, I am not quite sure what the best option should be. You have a better understanding of datacite than me, what do you think would be more relevant?

kimakan commented 1 year ago

After looking into the issue in more detail, I think that a alternateIdentifier is more appropriate since it's pointing to the same resource essentially. AFAIK, the relatedIdentifier should point to a different, related resource. However, I would like to put the content_type into the formats to keep track of the alternative formats (currently, only the format of #this is tracked).

            elif semantics == '#detached-header':
                datalink['formats'].append(content_type)
                datalink['related_identifiers'].append({
                    'alternate_identifier': access_url,
                    'alternate_identifier_type': 'URL'
                })
agy-why commented 1 year ago

sounds sensible, please make a PR. I like the idea of keeping track of the format. And I agree with the arguments on alternate vs. related.

agy-why commented 1 year ago

Alternate identifier is suppose to be an ID. Suggestion: declare datalinkID there like

<alternateIdentifier "alternateIdentifierType"="datalink">datalinkID</>
agy-why commented 1 year ago

Related identifier links to related resources like:

preview (viewer): describes

preview-image (related image): is suplemented by

documentation (url to docs): is documented by

auxilliary (url to relate dresources): raus of OAI

detach-header (url to header file): is supplemented by

this (url of the resource): IsDescribedBy

agy-why commented 1 year ago

progenitor (url(datalink) of resources used): IsDerivedFrom

potential extra semantics:

auxilliary-table (table with further data): references

kimakan commented 1 year ago

Additional note: Currently, the title of the oai record generated from the datalink tables is rendered from the description of the datalink entry with #doi. It's sensible, but it should be ensured that the description is related to the object and not to the DOI itself. Incorrect description: Digital object identifier (DOI) for the Table 1 from the Data Release 1 Correct description: Table 1 from the Data Release 1

kimakan commented 1 year ago

I found a bug in the creation routine of the tap_schema.datalink. The content_length adopted from the custom datalink tables, e.g., datalink_doi, are set to 0 if the value is None which is incorrect. The value is allowed to be None. In some cases it must be 'Noneif thecontent_length` attribute doesn't make any sense. https://github.com/django-daiquiri/daiquiri/blob/8501f8424b4aceec7119f82a54acfde298921217/daiquiri/datalink/adapter.py#L84-L94

Correctly, the content_length of the datalinks created automatically for all schemas and tables is set to None. https://github.com/django-daiquiri/daiquiri/blob/8501f8424b4aceec7119f82a54acfde298921217/daiquiri/datalink/adapter.py#L133-L143