Describe the bug
There is at least one instance of publisher-supplied metadata supplying a surname with extraneous left spaces. This has a downstream effect that interferes with the current ADSIngestEnrichment bibcode generator; that code selects the left-most character for the author initial (name[0]). The issue will be fixed in the enrichment package, but this is a data normalization issue that should happen at parse time.
To Reproduce
From the 2023-10-21 data delivery from IOP, take the file 2053-1591_10_10_105303/mrx_10_10_105303.xml which has the first author's name fielded as <surname> M</surname> (note preceeding space). Parse the file with JATSParser().parse. The resulting json will include
"authors": [
{
"name": {
"surname": " M",
Additional context
Add any other context about the problem here.
Describe the bug There is at least one instance of publisher-supplied metadata supplying a surname with extraneous left spaces. This has a downstream effect that interferes with the current ADSIngestEnrichment bibcode generator; that code selects the left-most character for the author initial (
name[0]
). The issue will be fixed in the enrichment package, but this is a data normalization issue that should happen at parse time.To Reproduce From the 2023-10-21 data delivery from IOP, take the file
2053-1591_10_10_105303/mrx_10_10_105303.xml
which has the first author's name fielded as<surname> M</surname>
(note preceeding space). Parse the file with JATSParser().parse. The resulting json will includeAdditional context Add any other context about the problem here.