arthurpsmith / author-disambiguator

Wikidata service to help create or link author items to published articles
GNU General Public License v3.0
33 stars 8 forks source link

Author Disambiguator makes the edits even if the author had been correctly matched before #179

Open VojtechDostal opened 1 year ago

VojtechDostal commented 1 year ago

Hello and thanks for this new tool which usually works like charm, but I discovered one possible bug:

In these items, the author had been matched before I marked the article as authored by M. Převorovský in Author Disambiguator:

https://www.wikidata.org/w/index.php?title=Q50446009&type=revision&diff=1763428548&oldid=1253915961 https://www.wikidata.org/w/index.php?title=Q42973519&type=revision&diff=1763428550&oldid=1590152408

Now there are two statements for the same author...

I am not sure what went wrong there

arthurpsmith commented 1 year ago

Hi - yes, in such cases you should notice in the author list displayed for the paper a vertical bar ('|') showing both the P50 and P2093 values. To fix these the best route is to either go to the work item page (click the blue linked title in the left-most column) if it's just for one or a small number of works, or if there are many go to the author page (click the green linked author name). Both of those pages have an selection option to "merge" duplicate author names (either one off for the work page, or for a list of works on the author page). I usually try to fix these sort of cases before running a name match, but it doesn't hurt to do it afterwards too. Let me know if there's anything not clear here!

VojtechDostal commented 1 year ago

Ah, I see, so it happens whenever the item already has author(text) and author BEFORE the job.

Would it make sense to check if the item already has the author in P50 and not add it again in that case - just remove the P2093 instead?

arthurpsmith commented 1 year ago

Yes, that would probably make sense. There are some funny cases though, for example where the same author really is listed twice on the manuscript (some papers with long author lists either accidentally or deliberately list an author twice for various reasons, like them having more than one affiliation).