Closed andrewsu closed 4 years ago
Property proposal is accepted: https://www.wikidata.org/wiki/Property:P6861
I looked at the API output of CIViC (e.g. variant id 12, but it appears that dbSNP ID is actually not sourced from CIViC. On the rendered output on a random CIViC page, the dbSNP ID is mentioned, but this is directly sourced from myvariant.info. I could integrate that into the current CIViC bot, but that would basically be adding an additional primary API to the bot (i.e. being myvariant.info). I am wondering if it is not better to create a designated bot that only synchronises with myvariant.info. Doing so we could add other identifiers to Wikidata as well, e.g. cosmic, although that probably requires some additional property proposals.
Looks like the rsid is included under the variant_aliases
key (at least for https://civicdb.org/api/variants/12). I agree that a myvariant-based bot would be best. But a quick and dirty option would be to scan variant_aliases
with a regex rs\d+
. Your call whether that is too quick and dirty...
I am a bit too uncomfortable scraping it that way, being quite pedantic at times about maintaining provenance. However, I created a draft version of a bot that sources myvariant.info and extends civic items with rsids and citations from dbsnp. source: https://github.com/SuLab/scheduled-bots/tree/master/scheduled_bots/myvariant See : https://civicdb.org/api/variants/12 for its results. For now, this is the only item being processed by this bot. The bot follows the following Schema: https://www.wikidata.org/wiki/EntitySchema:E103
+1 on a dedicated bot. I'm not sure the described by source
statements are necessary personally. Clearly sometimes there are a lot of them, and I'm not sure they add a huge amount of value... But having said that I'm fine either way...
I am on the fence wrt described by source
. I added it to have some more substance than only the dbSNP rsid. Also to add a more mature reference. Having a reference that a dbSNP statement is sourced from dbSNP does look rather redundant.
I will remove the described by source
property form the bot (and schema), but leave the reference for discussion.
Any preference for other myvariant properties to be added?
to be sure we're on the same page, I am looking at this diff/item https://www.wikidata.org/w/index.php?title=Q21851559&type=revision&diff=991027254&oldid=990985893. When you say "leave the reference for discussion" what reference are you referring to?
I'm not seeing any other critical myvariant properties to add at this moment...
I mean I chose the reference lists[stated in, retrieved, dbSNP id] in https://www.wikidata.org/wiki/Q21851559#P1343. But it really is minor thing and I am happy to park it and revisit when extra needed additions emerge.
For example, on this record https://www.wikidata.org/wiki/Q28420832 for a KRAS mutation, the rs id (RS61764370) is noted in the item label and alias. Better if we added it as a specific statement.
Surprisingly, there is not "dbSNP ID" property yet, so will need a property proposal...