bridgedb / datasources

Repository with the BridgeDb data source.
Creative Commons Zero v1.0 Universal
4 stars 8 forks source link

Miriam identifiers for Gramene sources in datasources.txt #4

Open ariutta opened 9 years ago

ariutta commented 9 years ago

In datasources.txt, Gramene Genes should have the Miriam identifier urn:miriam:gramene.gene.

Should we create new Miriam identifiers also for the following rows?

Christian-B commented 9 years ago

Hi, I don't know the science to say if the above four BridgeDb DataSources are all gramene.gene.

But if they are the correct fix is NOT to add urn:miriam:gramene.gene into the four rows but rather to merge the 4 BridgeDb dataSources into 1.

They either are all four the same in which case one BridgeDb DataSource is required or they are not the same in which case there should be four different Miriam Urn patterns.

egonw commented 9 years ago

Anders, I think we need to ask the people from the rice portal... Otherwise, I second your comments and Christian's suggestion... but let's get input from the rice people first, before making decisions...

ariutta commented 9 years ago

Sounds good. @egonw, would you like to initiate the contact with the rice experts? If not, I can try to find out who to contact.

Christian-B commented 9 years ago

In the past I have had very good and fast replies from the Miriam people themselves and especially from Nick Juty.

ariutta commented 9 years ago

I'll contact him this coming week. Thanks, Christian.

ariutta commented 9 years ago

I got a response from Nick:

"1. http://identifiers.org/ensembl.plant/ATMG01360 [possible identifiers.org URI for http://www.gramene.org/Arabidopsis_thaliana/Gene/Summary?g=ATMG01360-TAIR-G]

  1. MIR171a actually gives 3 results from search in ensembl, and in gramene. And none of them are rice (grass, maize and arabdopsis). I think that is because this is a gene name and not an accession. So in this case, if you want to specify the gene in rice, you would have to use http://identifiers.org/gramene.gene/GR:0100777 (using the accession number).
  2. I think you need to decide what you mean to identify for this one. My understanding is that LOC identifiers are used for specifying genomic regions/sequences, for which there is no currently defined function? So for example, I think LOC_Os04...means a location on the rice genome (Os) on chromosome 4, with precise co-ordinates. Is that what you want to identify? In this case, I would really rather go for a specific transcript if you know it. You should be able to use http://identifiers.org/ensembl/ for that.
  3. Although the identifier looks very odd for Gramene, it would probably be best to use http://identifiers.org/ensembl.plant/GRMZM2G174107"

He said he thinks it may be possible to add a collection for LOC identifiers, but he would need to look into it further first.