adsabs / ADSManualParser

Sets up a working environment for curators to manually process publisher metadata for the Classic backoffice
GNU General Public License v3.0
0 stars 3 forks source link

IOP - collaborations are badly fielded by publisher #12

Closed csgrant00 closed 8 months ago

csgrant00 commented 1 year ago

python run.py -p "/proj/ads/abstracts/data/IOPP/2023-08-02/0004-637X/0004-637X_953/0004-637X_953_1/0004-637X_953_1_37/*.xml" -t jats -f ioptest

seasidesparrow commented 11 months ago

There doesn't appear to be a collaboration in this XML record. As of 2023-11-29, the version in production produces the following output for the file linked above, and appears to match the current ADS record:

%F AA(Department of Astronomy, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Key Laboratory for Particle Astrophysics and Cosmology (MOE), Shanghai 200240, People's Republic of China; Shanghai Key Laboratory for Particle Physics and Cosmology, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China), AB(Department of Astronomy, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Key Laboratory for Particle Astrophysics and Cosmology (MOE), Shanghai 200240, People's Republic of China; Shanghai Key Laboratory for Particle Physics and Cosmology, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; <ID system="ORCID">0000-0002-8010-6715</ID> <EMAIL>jiaxin.han@sjtu.edu.cn</EMAIL>), AC(Department of Astronomy, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Key Laboratory for Particle Astrophysics and Cosmology (MOE), Shanghai 200240, People's Republic of China; Shanghai Key Laboratory for Particle Physics and Cosmology, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China), AD(Department of Astronomy, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Key Laboratory for Particle Astrophysics and Cosmology (MOE), Shanghai 200240, People's Republic of China; Shanghai Key Laboratory for Particle Physics and Cosmology, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Tsung-Dao Lee Institute, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; <ID system="ORCID">0000-0002-4534-3125</ID>), AE(Centre for Astrophysics and Planetary Science, Racah Institute of Physics, The Hebrew University, Jerusalem, 91904, Israel; <ID system="ORCID">0000-0001-7890-4964</ID>)
csgrant00 commented 11 months ago

Here's one with collaboration - seems to be working correctly:

/proj/ads/abstracts/data/IOPP/2023-02-08/0004-637X/0004-637X_943/0004-637X_943_2/0004-637X_943_2_177

csgrant00 commented 11 months ago

I put a real example in issue 12 - I think it's working properly

--C


Carolyn Stern Grant Astrophysics Data System (ADS)

@.*** Center for Astrophysics

617-495-7154 (voicemail) 60 Garden Street MS 83

617-495-7356 fax Cambridge, MA 02138


On Wed, Nov 29, 2023 at 12:01 PM Matthew Templeton @.***> wrote:

As of 2023-11-29, the version in production produces the following output for the file linked above:

%F AA(Department of Astronomy, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Key Laboratory for Particle Astrophysics and Cosmology (MOE), Shanghai 200240, People's Republic of China; Shanghai Key Laboratory for Particle Physics and Cosmology, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China), AB(Department of Astronomy, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Key Laboratory for Particle Astrophysics and Cosmology (MOE), Shanghai 200240, People's Republic of China; Shanghai Key Laboratory for Particle Physics and Cosmology, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; 0000-0002-8010-6715 @.***), AC(Department of Astronomy, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Key Laboratory for Particle Astrophysics and Cosmology (MOE), Shanghai 200240, People's Republic of China; Shanghai Key Laboratory for Particle Physics and Cosmology, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China), AD(Department of Astronomy, School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Key Laboratory for Particle Astrophysics and Cosmology (MOE), Shanghai 200240, People's Republic of China; Shanghai Key Laboratory for Particle Physics and Cosmology, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; Tsung-Dao Lee Institute, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China; 0000-0002-4534-3125), AE(Centre for Astrophysics and Planetary Science, Racah Institute of Physics, The Hebrew University, Jerusalem, 91904, Israel; 0000-0001-7890-4964)

— Reply to this email directly, view it on GitHub https://github.com/adsabs/ADSManualParser/issues/12#issuecomment-1832341135, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDRFJQL7KGZDJG72LKALLYG5S5VAVCNFSM6AAAAAA3EK5XZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZSGM2DCMJTGU . You are receiving this because you authored the thread.Message ID: @.***>

seasidesparrow commented 11 months ago

This record produces problematic results: /proj/ads/abstracts/data/IOPP/2023-03-15/0004-637X/0004-637X_945/0004-637X_945_2/0004-637X_945_2_124/apj_945_2_124.xml

Author field begins %A Abdurashidova, The HERA Collaboration: Zara; Adams, Tyrone; ...

seasidesparrow commented 11 months ago

The metadata file contains the following construct for the first author:

<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Abdurashidova</surname><given-names>The HERA Collaboration: Zara</given-names></name><xref ref-type="aff" rid="affiliation01">1</xref></contrib>

Given that the publisher themselves gave the first author's given name as The HERA Collaboration: Zara suggests we need a normalization step to fix errors like this.

seasidesparrow commented 8 months ago

Closing -- this is a normalization problem, not specific to Manual Parser