NHMDenmark / DaSSCo-asset-service

DaSSCo asset service is part of DaSSCo storage system
0 stars 0 forks source link

Metadata inconsistencies #98

Open ThomasAlscher1991 opened 2 months ago

ThomasAlscher1991 commented 2 months ago

Status An example of our current metadata template looks like this.

    "asset_created_by": "",
    "asset_deleted_by": "",
    "asset_guid": "7e7-a-12-0e-1b-05-0-001-00-000-0be189-00000",
    "asset_pid": "",
    "asset_subject": "",
    "date_asset_taken": "2023-10-18T14:27:05+02:00",
    "asset_updated_by": "",
    "audited": false,
    "audited_by": "",
    "audited_date": null,
    "barcode": [],
    "collection": "Vascular plants",
    "date_asset_created": null,
    "date_asset_deleted": null,
    "date_asset_finalised": null,
    "date_asset_updated": null,
    "date_metadata_created": "2023-10-31T08:22:17+01:00",
    "date_metadata_updated": "",
    "date_metadata_uploaded": "",
    "digitiser": "Santa Claus",
    "external_publisher": [],
    "file_format": "tif",
    "funding": "DaSSCo",
    "institution": "NHMD",
    "metadata_created_by": "IngestionClient",
    "metadata_updated_by": "",
    "metadata_uploaded_by": "",
    "multispecimen": false,
    "parent_guid": "",
    "payload_type": [
        "image"
    ],
    "pipeline_name": "PIPEHERB0001",
    "preparation_type": "sheet",
    "pushed_to_specify_date": null,
    "restricted_access": [],
    "specimen_pid": "",
    "status": "",
    "tags": {
        "metadataTemplate": "v2_1_0"
    },
    "workstation_name": "WORKHERB0001"
}

This is the template that currently gets accepted by ARS.

    "asset_created_by": "",
    "asset_deleted_by": "",
    "asset_guid": "asset_10",
    "asset_locked": false,
    "asset_pid": "asdf-12346-3333-100a21",
    "asset_subject": "",
    "asset_updated_by": "",
    "audited": false,
    "audited_by": "",
    "audited_date": null,
    "barcode": "",
    "collection": "test-collection",
    "date_asset_created": null,
    "date_asset_deleted": null,
    "date_asset_finalised": null,
    "date_asset_taken": "1998-11-15T16:00:00.000Z",
    "date_asset_updated": null,
    "date_metadata_created": "2024-05-15T09:11:21+02:00",
    "date_metadata_updated": "",
    "date_metadata_uploaded": "",
    "digitizer": "thbo",
    "external_publisher": [],
    "file_formats": [
        "TIF"
    ],
    "funding": "hundredetusindvis af dollars",
    "institution": "test-institution",
    "metadata_created_by": "",
    "metadata_updated_by": "",
    "metadata_uploaded_by": "",
    "multi_specimen": true,
    "parent_guid": null,
    "payload_type": "ct scan",
    "pipeline": "ti-p1",
    "preparation_type": "sheet",
    "pushed_to_specify_date": null,
    "restricted_access": [
        "USER"
    ],
    "specimen_pid": "",
    "specimens": [],
    "status": "WORKING_COPY",
    "subject": "folder",
    "tags": {
        "testtag2": "teztific8"
    },
    "workstation": "ti-ws-01"
}

This is what we get when we querry asset metadata.

    "asset_locked": false,
    "asset_guid": "asset_10",
    "asset_pid": "asdf-12346-3333-100a21",
    "audited": false,
    "collection": "test-collection",
    "created_date": "2024-08-06T11:15:24.021Z",
    "date_metadata_updated": "2024-08-06T11:15:24.021Z",
    "date_asset_taken": null,
    "date_asset_deleted": null,
    "date_asset_finalised": null,
    "date_metadata_taken": null,
    "digitiser": null,
    "error_message": null,
    "error_timestamp": null,
    "event_name": null,
    "events": [
        {
            "user": null,
            "timeStamp": "2024-08-06T11:15:24.021Z",
            "event": "CREATE_ASSET_METADATA",
            "pipeline": "ti-p1",
            "workstation": "ti-ws-01"
        }
    ],
    "file_formats": [
        "TIF"
    ],
    "funding": "hundredetusindvis af dollars",
    "httpInfo": null,
    "institution": "test-institution",
    "internal_status": "COMPLETED",
    "multi_specimen": false,
    "parent_guid": null,
    "payload_type": "ct scan",
    "pipeline": "ti-p1",
    "restricted_access": [
        "USER"
    ],
    "status": "WORKING_COPY",
    "specimens": [],
    "subject": "folder",
    "tags": {
        "testtag2": "teztific8"
    },
    "updateUser": null,
    "workstation": "ti-ws-01",
    "writeAccess": false
}

Issue In order to upload our metadata to ARS, we

  1. Need to rename keys (for example workstation_name to workstation)
  2. Need to fill in fields that we can't assign a value yet (for example asset_pid)
  3. Will lose some fields (for example barcode)

Actions

  1. We would like to change the set of minimal viable data for creating asset, see this issue.
  2. We would like to check if renaming field names to our template names is feasible.

We would like to persist all of the metadata from the template.

Baeist commented 2 months ago

date_asset_taken does not come back when it is input on creation.

Barcode do come back in the specimen protocol that ARS is using it just looks different from our metadata. This we will have to conform to (just read each barcode from the list of specimens in the specimen protocol instead of the barcode field).