SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

Filter out bad DO aliases #84

Closed stuppie closed 6 years ago

stuppie commented 6 years ago

https://www.wikidata.org/wiki/User_talk:ProteinBoxBot#Redundant_aliases

In edits like this one, the bot seems to be adding "Name (disorder)" as an alias (complete with unwanted capitalization). Even if that's in a source, it's probably not correct to be adding the source's disambiguator to the Wikidata record.

Also, for edits such as this one, is there a way to tell it to stop adding aliases after they've been corrected? Abbreviations such as "acute/subac." should be spelled out, and the abbreviated version shouldn't be used at all.

andrawaag commented 6 years ago

I am not sure if I agree. fixing this would require adding a rule base to deal with this. Wikidata is not a primary source and should reflect the state of the primary source, don't you think? Otherwise, we have to dive in the rabbit hole of many edge cases. What to do for example with DD-NOS (https://www.wikidata.org/wiki/Q3540880)?

IMHO this should not be corrected in Wikidata, but in the source. We could argue in blatant cases, we could simply not add it to Wikidata, before consulting with the primary source.

stuppie commented 6 years ago

I'd normally agree that wikidata should reflect that state of the primary source, however, the aliases don't have references and so you cannot tell the primary source..

stuppie commented 6 years ago

https://github.com/SuLab/scheduled-bots/commit/b1746f155d3f6b6be8808251a91d53587057bfea