cern-sis / issues-inspire

0 stars 0 forks source link

set up German curation detection and queues #559

Open michamos opened 2 months ago

michamos commented 2 months ago

DESY will start focusing on German publications. We need to set up detection of German papers in the workflows and create special SNow functional categories for them.

Like is being done for France, we need to detect German articles based on the fulltext (for arXiv) or raw_affiliations (for non-arXiv), looking for the regex \b(Germany|Deutschland)\b. If there is a match, we should create an additional ticket in the German curation functional category.

Note that it might make sense to refactor the code to avoid calling GROBID multiple times in the workflow for author extraction + each country detection, in case it's feasible.

PascalEgn commented 2 months ago

Would suggest to put the refactoring into a separate issue as adding the new queue sounded kinda urgent.

The needed changes should be in: https://github.com/inspirehep/inspire-next/pull/4351