DOME-4-0 / Tools-and-Services

This repository is to help with planning and prioritisation of tasks in WP2
0 stars 0 forks source link

Permanent identifier (PID) scheme for resources (improved) #72

Open n-vc opened 2 years ago

n-vc commented 2 years ago

DOME 4.0 must have a scheme for defining persistent identifiers for identifiers, below we detail some of the possibilities to be investigated.

Seeking for improvements:

An improved possible permanent resource address scheme could be:

METADATA (dereferenceable IRIs) Note: Metadata id(@id) refers to itself.

Dataset
<dome-domain>/dataset/<uuid>   _for the dataset metadata_
Data files
<dome-domain>/distribution/<content-hash=id>  _for that data file metadata_

DATA (in distribution)

<dome-domain>/data/<content-hash=id>   _the actual data file (for an API)_
<dome-domain>/download/<content-hash=id>  _the actual data file (for a Web client)_

The idea is that the scheme will hold as we will always have "datasets" and "datafiles".

We had:

{
            "type": "Distribution",
            "id": "http://example.com/datasets/52f7a3e2-e395-42df-8f4a-4b062c305e18/e6bce3449ae37bf0af7372bebc1255a8",
            "title": "N/A",
            "fileName": "./XRD/15min-XRD-40-50deg-0o25PDS-0o5AS_PDFlow25.xrdml",
            "accessURL": "http://example.com/datasets/52f7a3e2-e395-42df-8f4a-4b062c305e18/XRD/15min-XRD-40-50deg-0o25PDS-0o5AS_PDFlow25.xrdml",
            "downloadURL": "http://example.com/datasets/52f7a3e2-e395-42df-8f4a-4b062c305e18/XRD/15min-XRD-40-50deg-0o25PDS-0o5AS_PDFlow25.xrdml",
            "mediaType": "application/octet-stream",
            "byteSize": "8353",
            "checksum": {
                "type": "Checksum",
                "algorithm": "checksum:Algorithm_md5",
                "checksumValue": "e6bce3449ae37bf0af7372bebc1255a8"
            },
            "conformsTo": []
        }

Anything beyond the id can be added (centralised) via the JSON-LD semantic context. So we can improve it to be even more general: (The RDF DB or a JSON-LD processor will complete the IRIs)

It coud be like:

        {
            "id": "1d0dc73729c3a280b75c041259b5b2c0",
            "type": "Distribution",
            "title": "N/A",
            "fileName": "./3ab/3ab-H/format.temp",
            "accessURL": "1d0dc73729c3a280b75c041259b5b2c0",
            "downloadURL": "1d0dc73729c3a280b75c041259b5b2c0",
            "mediaType": "application/octet-stream",
            "byteSize": "2475",
            "checksum": {
                "type": "Checksum",
                "algorithm": "checksum:Algorithm_md5",
                "checksumValue": "1d0dc73729c3a280b75c041259b5b2c0"
            },
            "conformsTo": []
        }   

Some info/references: identification of resources - by http-URIs

e.g. Study on persistent URIs (EU)