adiwg / mdTranslator

Metadata translation tool built using Ruby
https://www.adiwg.org/mdTranslator/
The Unlicense
14 stars 12 forks source link

sbJSON provenance object should map to metadataInfo #244

Open dkarthur opened 1 year ago

dkarthur commented 1 year ago

The sbJSON reader is currently mapping the sbJSON provenance object to resource citation object of the internal translator data format.

Dates and contacts associated with the metadata record itself and not the ScienceBase item being referenced are being translated incorrectly. Dates are being mapped from the sbJSON “provenance” to mdJson “resourceInfo,” while contacts are not being translated at all. Addressing this issue is a critical need for NGGDPP and ReSciColl developers in order to provide appropriate metrics for USGS and external ReSciColl users and stakeholders.

sbJSON Example:

"provenance": {
        "dateCreated": "2023-01-10T17:39:42Z",
        "lastUpdated": "2023-01-10T19:41:42Z",
        "lastUpdatedBy": "vcrystal@usgs.gov",
        "createdBy": "vcrystal@usgs.gov"
}

Current mdJSON translation:

"resourceInfo": {
    "citation": {
        "title": "Carlsbad Cores Collection",
        "date": [
            {
                "date": "2023-01-10T17:39:42+00:00",
                "dateType": "creation"
            },
            {
                "date": "2023-01-10T19:41:42+00:00",
                "dateType": "lastUpdate"
            },
            {
                "date": "2023-01-10",
                "dateType": "creation",
                "description": "Creation"
            }
        ],
        "responsibleParty": [
            {
                "role": "owner",
                "party": [
                    {
                        "contactId": "40fff240-e50e-49a7-9e6a-259326e5e866"
                    }
                ]
            }
        ]
    }
...

Desired translation:

metadataInfo > metadataDate

metadataInfo > metadataContact

The “createdBy” and “lastUpdatedBy” properties in the sbJSON “provenance” section are currently not found anywhere in the mdJSON output from mdTranslator. They should be mapped to “metadatainfo”: “metadataContact” with “role” of “author” (or "curator") and “editor” accordingly.

hmaier-fws commented 1 year ago

@dkarthur what do you mean by "contacts are not being translated at all"? The code snippet you provided seems to display a responsibleParty.

Regarding "...with “role” of “author” (or "curator")" I would probably recommend "author" as this is an ISO code described as "party who authored the resource". "Curator" is an ADIwg extended code defined as "party who serves as curator for specimens deposited in a repository". There is also an "originator" (party that created the resource); which might be applicable if the metadata was "authored" by one party but then uploaded to the system by a second party?

dkarthur commented 1 year ago

@hmaier-fws Perhaps I should've phrased it as: Contacts from sbJSON provenance object are not being mapped to mdJSON. When not using mdEditor, I don't know how to resolve the mdJSON responsibleParty code. It doesn't appear to map to the ScienceBase user who created the metadata record there, and it's that information that doesn't appear to be coming through the translator at all.

Also, to be sure I understand your comment on the second part, when you refer to "resource," are you referring to whatever it is to which the metadata refers, not the metadata record itself, or are you referring to the metadata?

chris-macdermaid commented 1 year ago

The module_provenance.rb only handles the "dateCreated" and "lastUpdated" fields. The "lastUpdatedBy" and "createdBy" fields are dropped by the sbJson reader.

In addition to the above, the sbJson "dates" field is also added to the resourceInfo section in module_date.rb. The result is that there can be 2 creation dates in the resourceInfo section

mdJson image

sbJson Screenshot from 2023-04-03 11-19-32

Screenshot from 2023-04-03 11-20-11

chris-macdermaid commented 1 year ago

A snapshot from ScienceBase's documentation.

Provenance:

Datatype: Provenance object The ScienceBase Provenance attribute is an open text field that is used to describe the origin of an item, especially in terms of how the item came to be introduced to ScienceBase. It can be used to describe the full provenance of some form of data that may have been through a number of derivations.​​​​​​​

provenance Object annotation Datatype: String The text of the provenance.

dataSource Datatype: String Where the item came from. If this item was created by a person in ScienceBase it will be "Input Directly". If it was harvested from an external source this will show that instead.

dateCreated Datatype: DateTime The date and time the item was created.

createdBy Datatype: String The person or organization who created the item.

lastUpdated Datatype: DateTime The date and time the item was last updated.

lastUpdatedBy Datatype: String The last person or organization to update the item.

"provenance": { "annotation":"Provenance1", "dataSouce":"Input directly", "dateCreated":"2015-11-09T19:02:45Z", "lastUpdated":"2015-11-09T19:02:45Ze", "lastUpdatedBy":"abc@usgs.gov", "createdBy":"abc@usgs.gov", "fileProcess": ???, "linkProcess": ??? }​​​​​​​

dwalt commented 1 year ago

Verified createdBy and lastUpdatedBy not being populated in ScienceBase, and not a factor of sbJSON-mdJSON translation. In addition, dataSource is not populated either. How, when or whether it is currently used by ScienceBase is unknown. @dkarthur will run use case tests to help us understand how and when provenance is created and updated as follows:

  1. Created using the ReSciCol Dashboard app
  2. Created using ScienceBase
  3. Created using mdEditor

For each create example, test update in mdEditor and re-publish to ScienceBase (update item) to help us determine if update processes have different logic than create processes regarding writes to provenance.

Test update in ScienceBase regardless of create method, update in mdEditor and re-publish to ScienceBase.

Request to ScienceBase team:

  1. ScienceBase API scripts will need to be updated to populate createdBy, updatedBy
  2. Relative to test findings, ScienceBase API scripts may need additional changes

Agreement with @dkarthur to:

  1. map createdBy and lastUpdatedBy to: [schema{ } > metadata{ } > metadataInfo{ } > metadataDate[ ] > object{ } > description]
  2. Accept proposal to remap sbJSON>provenance dateCreated, dateUpdated to metadataDate>date, with "creation", and "lastUpdate" dateType as is appropriate
dwalt commented 1 year ago

I think we have agreed on a different proposal. Can this issue be closed?