SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

Biological variants of genes not explicitly referred to the human taxon #61

Closed floatingpurr closed 7 years ago

floatingpurr commented 7 years ago

Hello once again. I was trying to get all biological variants (P3433) of human genes in wikidata when I realized that some of them (17 right now) are referred to genes that do not have the statement:

gene found in taxon (P3433) Homo sapiens (Q15978631)

Here is a query to see such a behavior in action. From a theoretical point of view, I expected all human genes explicitly referred to their taxon. I'm quite new to wikidata, so I don't know if that is perfectly normal or not. Isn't there a kind of ShEx to formalize data constraints?

stuppie commented 7 years ago

Hi floatingpurr, There was a bug in handling of genes that encode more than one protein. I've implemented it and these should get corrected by the bot by tomorrow.

floatingpurr commented 7 years ago

Hi stuppie, thank you! Please note that the taxon property seems still missing in Q14874282 and in Q18033471

stuppie commented 7 years ago

The bot is still running, should be done in 3 hrs.

floatingpurr commented 7 years ago

There are just few genes left without the P3433. I don't know if it's ok or if there is a piece of bug still there.

stuppie commented 7 years ago

@floatingpurr Check out this page here for a log of the latest bot run. This is a work in progress, but they should all be showing up here eventually. You can see 666 out of 58,753 items were skipped due to some failure/conflict. The two left have some ID conflicts and need to be manually fixed..

andrewsu commented 7 years ago

Looks like someone fixed the human examples -- thanks! Closing...