elifesciences / elife-tools

Python library for parsing eLife article XML data.
MIT License
15 stars 7 forks source link

Format contributor speed up #396

Closed gnott closed 2 years ago

gnott commented 2 years ago

Related to issue https://github.com/elifesciences/issues/issues/6230

Running the tests on a Crossref deposit library, when parsing XML files which have many authors it is much slower than articles with fewer authors. Tracking down one particular place where pre-parsing the <aff> tags before formatting authors results in a speed up by using the data in the loop, instead of looking up aff data in each iteration of the loop.

format_contributor() is modified to accept an optional target_tags_aff argument, otherwise the change should be fully backwards compatible.