Closed jordanpadams closed 1 year ago
@jordanpadams what's the definition of the "available" date? Should be trivial to add under the dates
attribute as
{
"date": $someValue,
"dateType": "Available"
"dateInformation": <let me know if something should go here - perhaps the definition of that date?>
}
Will need to add to search criteria for DOICoreActionList
.
Global (per-deployment) keywords are populated from a line in the configuration.
It appears that setting global keywords in the config is the only method currently implemented - every mention of mutation of .keywords
uses get_global_keywords()
Keywords are not currently implemented as search criteria in DOICoreActionList
.
@jordanpadams what is the desired query functionality here?
types
attribute has properties resourceType
(freeform string, mapped to Doi.product_type_specific
) and resourceTypeGeneral
(schema-enumerated string, mapped to Doi.product_type
and enum values ProductType
)
Both properties are mapped from the product_class pds4 field. @jordanpadams please advise whether any updates to these mappings are necessary.
relatedIdentifiers
attribute also has an optional resourceTypeGeneral
property, but we don't appear to be setting that directly, anywhere.
The References requirement is too vague to do much with. @jordanpadams please advise.
I couldn't find any existing references to citations of other products. Is this related?
"relatedIdentifiers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"relatedIdentifier": {"type": "string"},
"relatedIdentifierType": {"$ref": "#/definitions/relatedIdentifierType"},
"relationType": {"$ref": "#/definitions/relationType"},
"relatedMetadataScheme": {"type": "string"},
"schemeURI": {"type": "string", "format": "uri"},
"schemeType": {"type": "string"},
"resourceTypeGeneral": {"$ref": "#/definitions/resourceTypeGeneral"}
},
"required": ["relatedIdentifier", "relatedIdentifierType", "relationType"],
"if": {
"properties": {
"relationType": {"enum": ["HasMetadata", "IsMetadataFor"]}
}
},
"else": {
"$comment": "these properties may only be used with relation types HasMetadata/IsMetadataFor",
"properties": {
"relatedMetadataScheme": false,
"schemeURI": false,
"schemeType": false
}
}
},
"uniqueItems": true
}
Regarding the ORCIDs, where are the existing metadata guidelines? I can't see them in the docs.
Is the idea that ORCID will be a new, optional field for submitted DOIs?
@jordanpadams what's the definition of the "available" date? Should be trivial to add under the
dates
attribute as{ "date": $someValue, "dateType": "Available" "dateInformation": <let me know if something should go here - perhaps the definition of that date?> }
Will need to add to search criteria for
DOICoreActionList
.
@alexdunnjpl sounds great. for dateInformation maybe let's put something like "Date of first publication"
Global (per-deployment) keywords are populated from a line in the configuration.
It appears that setting global keywords in the config is the only method currently implemented - every mention of mutation of
.keywords
uses get_global_keywords()Keywords are not currently implemented as search criteria in
DOICoreActionList
.@jordanpadams what is the desired query functionality here?
- is there a need to query for multiple keywords simultaneously?
- if so, is there a need to support union and intersection?
- if so, is there a need to support nested boolean logic?
@alexdunnjpl sorry about the confusion here. I think there may be somewhere else in the code where these values are appended to something where additional keywords are auto-generated. the confusion is our code mentions keywords (leftover from when we used OSTI as our DOI provider) versus the current DataCite metadata calls these subjects. For instance for DOI 10.17189/rbz8-2327, the keywords/subjects generated were:
"subjects": [
{ "subject": "PDS" },
{ "subject": "PDS4" },
{ "subject": "code" },
{ "subject": "collection" },
{ "subject": "consists" },
{ "subject": "fortran" },
{ "subject": "kmag" },
{ "subject": "python" },
{ "subject": "saturn" },
{ "subject": "wrapper" }
],
These subjects/keywords are actually not intended to really be searchable from the PDS or even from the DOI Service search, it is really intended to be searched from ADS.
So if I remember correctly, I think subjects
are populated with 3 sets of values:
I would say we go with 1 and 2 above, but let's remove 3. Let me know if you cannot track this down. In which case, we can wait until Thomas gets back and ask him about it.
The References requirement is too vague to do much with. @jordanpadams please advise.
I couldn't find any existing references to citations of other products. Is this related?
"relatedIdentifiers": { "type": "array", "items": { "type": "object", "properties": { "relatedIdentifier": {"type": "string"}, "relatedIdentifierType": {"$ref": "#/definitions/relatedIdentifierType"}, "relationType": {"$ref": "#/definitions/relationType"}, "relatedMetadataScheme": {"type": "string"}, "schemeURI": {"type": "string", "format": "uri"}, "schemeType": {"type": "string"}, "resourceTypeGeneral": {"$ref": "#/definitions/resourceTypeGeneral"} }, "required": ["relatedIdentifier", "relatedIdentifierType", "relationType"], "if": { "properties": { "relationType": {"enum": ["HasMetadata", "IsMetadataFor"]} } }, "else": { "$comment": "these properties may only be used with relation types HasMetadata/IsMetadataFor", "properties": { "relatedMetadataScheme": false, "schemeURI": false, "schemeType": false } } }, "uniqueItems": true }
@alexdunnjpl sorry for the runaround here. let's scratch this. we can bring this back up at a later date if needed.
types
attribute has propertiesresourceType
(freeform string, mapped toDoi.product_type_specific
) andresourceTypeGeneral
(schema-enumerated string, mapped toDoi.product_type
and enum valuesProductType
)Both properties are mapped from the product_class pds4 field. @jordanpadams please advise whether any updates to these mappings are necessary.
relatedIdentifiers
attribute also has an optionalresourceTypeGeneral
property, but we don't appear to be setting that directly, anywhere.
@alexdunnjpl I think we are good here. I think I was just asking if we could take a look at the existing DOI metadata we have in DataCite and verify they match one of those expected values. Just wanted to make sure we didn't have any old DOIs out there that do not match this appropriately.
Regarding the ORCIDs, where are the existing metadata guidelines? I can't see them in the docs.
Is the idea that ORCID will be a new, optional field for submitted DOIs?
@alexdunnjpl for starters, I cannot remember if we communicate with the DOI Editor with XML or JSON. If it is JSON, you can ignore below. We can just link to the metadata guidelines elsewhere on the DOI Editor and call it good.
Otherwise, if we are using XML, I am thinking we just include some commented out example of providing an ORCID and then we leave it to the user to input? I prefer self-documenting XML where possible, especially since these values cannot be pulled from the labels, so they must be manually input. if adding something commented out is not reasonable, that is OK too.
@alexdunnjpl sounds great. for dateInformation maybe let's put something like "Date of first publication"
@jordanpadams still need an explicit definition for that date. Date of document Date of DOI reservation? Date of DOI release? Something else?
Regarding keywords/subjects, thanks for the extra info/context - will take a look and sort that out.
Regarding ORCIDs, what's the "DOI Editor"? We're sending DOI records as JSON payloads to DataCite, but it sounds like maybe you're talking about something else.
still need an explicit definition for that date. Date of document Date of DOI reservation? Date of DOI release? Something else?
Date of first publication.
Regarding ORCIDs, what's the "DOI Editor"? We're sending DOI records as JSON payloads to DataCite, but it sounds like maybe you're talking about something else.
That answers my question. We will just handle this on the editor side of the house. https://pds-gamma.jpl.nasa.gov/tools/doi-editor/ (ping @viviant100 and SA team to gain access), https://github.com/NASA-PDS/doi-ui/
Per jordanpadams, available date should be parsed from latest modification date in pds4 xml label, else returned as None
Per jordanpadams
@alexdunnjpl so sorry to do this, but after talking to some other stakeholders, we came to the realization the modification date will not be accurate for "Available" date, since they may have modified it in September, but it was not released until December. Can we comment out that code for now until we have a better idea of how we will get this date from the metadata?
Since the currently-implemented value is given as Doi.publication_date
and not available_date
and it's used throughout doi-service
, recommend not changing anything until @tloubrieu-jpl is back.
Taking a second look at the subjects/keywords item:
Doi.keywords
Doi.__init__()
get_global_keywords()
(per global string search for .keywords.update
), which populates semicolon-separated strings from the config file's OTHER
global_keyword_values
.initialization pulls from core.input.pds4_util.DOIPDS4LabelUtil.get_keywords()
, which extracts keywords from the following PDS4 label xml
keyword_fields = {
"investigation_area",
"observing_system_component",
"target_identification",
"primary_result_summary",
"description",
}
I can't find any mechanism for an additional (item 3) source of keywords.
The XML label linked above yields the following keywords, which seems correct:
{'bundle', 'image', 'secondary', 'data_imaging', 'primary', 'mars2020_pixl', 'camera', 'mcc', 'micro-context', 'micro', 'mars2020_imgops', 'rover', 'product', 'context', '2020', 'data', 'mars', 'member', 'perseverance', 'pixl', 'collection', 'data_mcc_imgops'}
Will consider this on ice until @tloubrieu-jpl is back.
@tloubrieu-jpl just re-pinging you so you're aware this is still current/active
@alexdunnjpl @jordanpadams I will ask for an introduction of this ticket during the breakout today
Per @jordanpadams in breakout meeting, no changes required to available/published date, currently
Per @jordanpadams @tloubrieu-jpl , remove description
from source targets for keyword generation.
Conclusion for now:
💪 Motivation
...so that I can integrate more seamless with ADS
📖 Additional Details
Per info from Anne R., here are the fields that are highest priority for ADS searches:
⚖️ Acceptance Criteria
Given a DOI When I perform a query of that DOI from the DOI service or DataCite search Then I expect the metadata returned for that DOI to contain the improvements described above
⚙️ Engineering Details