cern-sis / issues-inspire

0 stars 0 forks source link

make author disambiguation more robust #459

Open michamos opened 3 months ago

michamos commented 3 months ago

While investigating author disambiguation failures, I noticed several metadata issues related to author names, see https://inspirehep.zulipchat.com/#narrow/stream/195298-experts/topic/author.20disambiguation/near/426578817.

Examples of incorrect author names:

It would be good to make the pattern in https://github.com/inspirehep/inspire-schemas/blob/98f08c311cd6471091d57bc39b3b44da625456c2/inspire_schemas/records/hep.yml#L408 (and other places) more strict to prevent this issue. The main problem this currently causes is that if there is no valid token in the first name/initial, the matching process generates an invalid search query with an empty clause in must : https://sentry.siscern.org/inspire/hep/issues/320216. That problem could also be fixed directly (as an alternative or in addition).