Closed codycooperross closed 1 year ago
The goal here should be to never discard regardless if url or not. I think the proposal of using something like how name_identifiers works makes sense to me.
I was able to reproduce this bug locally with following steps.
This works
1) From this fixture file remove schemeURI
attribute and keep affiliationIdentifier
with URL https://github.com/datacite/bolognese/blob/b0a7df3c9dd6a45eaf56fd0e06d304e4db9b837d/spec/fixtures/datacite-example-ROR-nameIdentifiers.xml#L9
After processing this XML file at bolognese/spec/author_utils_spec.rb:163
. We will see that creators with affiliation
will have affiliationIdentifier
in the response after processing this metadata.
(byebug) subject.creators[0]
{"nameType"=>"Personal", "name"=>"Robinson, Erin", "givenName"=>"Erin", "familyName"=>"Robinson", "nameIdentifiers"=>[{"schemeUri"=>"https://orcid.org", "nameIdentifierScheme"=>"ORCID"}], "affiliation"=>[{"name"=>"Metadata Game Changers", "affiliationIdentifier"=>"https://ror.org/05bp8ka05", "affiliationIdentifierScheme"=>"ROR"}]}
This won't work
2) From this fixture file remove schemeURI
attribute and keep affiliationIdentifier
without URL like below,
<creator>
<creatorName nameType="Personal">Erin Robinson</creatorName>
<nameIdentifier schemeURI="https://orcid.org/" nameIdentifierScheme="ORCID"> https://orcid.org/0000-0001-9998-0114 </nameIdentifier>
<affiliation affiliationIdentifier="05bp8ka05" affiliationIdentifierScheme="ROR"> Metadata Game Changers </affiliation>
</creator>
Now in the test file add byebug
bolognese/spec/author_utils_spec.rb:163
and check subject after processing the schema. we will see affiliation
attribute in the response does not have affiliationIdentifier
.
(byebug) subject.creators[0]
{"nameType"=>"Personal", "name"=>"Robinson, Erin", "givenName"=>"Erin", "familyName"=>"Robinson", "nameIdentifiers"=>[{"schemeUri"=>"https://orcid.org", "nameIdentifierScheme"=>"ORCID"}], "affiliation"=>[{"name"=>"Metadata Game Changers", "affiliationIdentifierScheme"=>"ROR"}]}
Describe the bug
When reading from DataCite XML, affiliation identifiers are normalized as either 1) a concatenation of the schemeURI and affiliation identifier or 2) as a URL, if the affiliation identifier starts with "https:/". Identifiers not rendered as URLs are discarded.
Expected Behaviour
Affiliation identifiers are normalized according to their identifierScheme. Other fields, like funder identifier and name identifier, are normalized according to their identifierScheme, resulting in normalizations that operate in more scenarios for identifiers like RORs and ORCIDs
Current Behaviour
Identifiers are not normalized according to their identifierScheme, and identifiers not rendered as URLs are discarded.
Steps to Reproduce
ROR affiliation identifiers without the "https://ror.org" URL or a schemeURI (ex.
05dxps055
) are discarded when read by bolognese.Context (Environment)
This issue affects DataCite JSON and API responses when metadata is submitted as XML.
Proposal
Possible Implementation
Affiliation identifier normalization could mirror name identifier normalization here:
https://github.com/datacite/bolognese/blob/b0a7df3c9dd6a45eaf56fd0e06d304e4db9b837d/lib/bolognese/author_utils.rb#L32
Front conversations