Is your feature request related to a problem? Please describe.
Currently, parsers capture affiliation data in text format, and these are added to "affPubRaw" in the ingest data model Affil object. However, affiliation data may also provided as an affiliation identifier in various systems, e.g. ROR, ISNI or GRID, either with or in place of text data. As an example, crossref XML includes the tag <institution_id type="TYPE"> as a possible return field. (See https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/affiliations/). The ADS Ingest_Data_Model Affil object already has space for affPubID and affPubIDType, but they are not implemented in base.py or any other parsers yet.
Describe the solution you'd like
We should add logic to each of the content parsers that can detect and properly field insitution identifiers, and store them in the ingest_data_model.affils.affPubID and affPubIDType fields for each contributor that has them.
Additional context
As an example, the input test file jats_springer_EPJC_s10052-023-11699-1.xml has <institution_id> tags for both GRID and ISNI:
In this particular example, we see two identifiers, GRID and ISNI. Currently, the ingest_data_model is expecting a single value here; we might consider updating the data model to support a list of id-type objects, or merge multiple values into a single string via a join statement.
Is your feature request related to a problem? Please describe. Currently, parsers capture affiliation data in text format, and these are added to "affPubRaw" in the ingest data model Affil object. However, affiliation data may also provided as an affiliation identifier in various systems, e.g. ROR, ISNI or GRID, either with or in place of text data. As an example, crossref XML includes the tag
<institution_id type="TYPE">
as a possible return field. (See https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/affiliations/). The ADS Ingest_Data_Model Affil object already has space foraffPubID
andaffPubIDType
, but they are not implemented inbase.py
or any other parsers yet.Describe the solution you'd like We should add logic to each of the content parsers that can detect and properly field insitution identifiers, and store them in the ingest_data_model.affils.affPubID and affPubIDType fields for each contributor that has them.
Additional context As an example, the input test file
jats_springer_EPJC_s10052-023-11699-1.xml
has<institution_id>
tags for both GRID and ISNI:In this particular example, we see two identifiers, GRID and ISNI. Currently, the ingest_data_model is expecting a single value here; we might consider updating the data model to support a list of id-type objects, or merge multiple values into a single string via a join statement.