SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

civic bot not adding "instance of" statement #121

Open andrewsu opened 4 years ago

andrewsu commented 4 years ago

example: https://www.wikidata.org/wiki/Q61818930

I think this sparql query shows all items with civicIDs without instance of statements: https://w.wiki/6ww

I'm expecting that all items added/edited by the civic variant bot should be on items having instance of sequence variant (Q15304597) or one of its children

probably need to have this issue fixed before the paper goes out...

andrawaag commented 4 years ago

This one is already on my plate and a tough one. There are basically, two issues here.

  1. Currently, not all of the sequence ontology is in Wikidata yet, simply because it is not available as CC0. This means that if a variant type by its representation in the sequence ontology, is not yet in Wikidata it needs to be added manually. My understanding of non-cc0 data is that one can not batch upload all, but adding a reference to a single SO is allowed. Being able to upload all of SO to Wikidata would solve this.

  2. Some variants don't have a specific variant type annotated in CIViC. The example given is of this type. (https://www.wikidata.org/wiki/Q61818930) Its CIViC record gives "Variant Type: None specified.". Currently, the bot ignores this. An easy fix here would be to add, as you suggest, "instance of sequence variant (Q15304597)".

  3. Some time ago we increased the threshold wrt to quality. When we started we added all CIViC records, currently, we only add CIVIC records with high-quality indication. This has resulted in items in Wikidata that only mention the CIViC ID, plus of which gene it is a variant.

Concluding.