bridgedb / datasources

Repository with the BridgeDb data source.
Creative Commons Zero v1.0 Universal
4 stars 8 forks source link

Update HGNC linkouts #12

Closed DeniseSl22 closed 1 year ago

DeniseSl22 commented 3 years ago

See also bridgedb/datasources#6 ; HGNC linkout on WikiPathways website, and PathVisio don't work...

image Result: image

DeniseSl22 commented 3 years ago

URI should be (for HGNC Accession Number): https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:13890 . For HGNC symbol, URIs on identifiers.org don't function anymore... @egonw

egonw commented 3 years ago

Yes, we have a problem here. The URL for symbols just don't work. I'll have to remove it.

DeniseSl22 commented 3 years ago

But, that would mean that all GeneProducts+Proteins annotated with HGNC symbol, will not have a working linkout on the WP website... Should we write a bot, to convert them to HGNC IDs (existing ones and in the future)? @mkutmon @AlexanderPico ?

tabbassidaloii commented 1 year ago

any update on this issue? @DeniseSl22

DeniseSl22 commented 1 year ago

Nope, we should make updates to all GPMLs with HGNC symbols in them, to get the linkouts working again. Also, what would be good to change in PV4, is not showing HGNC symbols anymore when looking for an identifier @mkutmon . And we could write a unit test for HGNC symbols in the future, so we can curate them later @egonw .

tabbassidaloii commented 1 year ago

We can try to fix it in the hackathon planned for Feb

egonw commented 1 year ago

See also: https://github.com/bridgedb/datasources/issues/6#issuecomment-1380150769

Chris-Evelo commented 1 year ago

I was just reading the whole threat. I think it makes perfect sense for people to use HGNC symbols and not ID's. Actually conceptually HGNC symbols were meant to be the thing use d by the community as they are both unique, can be resolved and are both human (add meaning) and machine readable. HGNC thus is a special case where we want to allow people to use both symbols and IDs. If you want to automate anything it would be more the other way around. Like have a quick (hover?) lookup where you can find what symbol is meant if only the ID is there.

DeniseSl22 commented 1 year ago

@Chris-Evelo : that's the thing, the symbols are no longer resolvable... Through BridgeDb, we can retrieve the HGNC symbol based on any other ID, but if the HGNC symbol is the ID used for annotation, the linkouts don't work anymore.

egonw commented 1 year ago

so we can curate them later @egonw .

Step 1:

[ERROR] Failures: 
[ERROR]   Genes.numericHGNCIDs:73->JUnitTests.performAssertions:24 Found integer HGNC symbols (did you mean 'HGNC Accession number'?): 4. Details:
http://www.wikipathways.org/instance/WP4919_r123490 AKT has 391
http://www.wikipathways.org/instance/WP5130_r123523 TCRA has 12027
http://www.wikipathways.org/instance/WP5130_r123523 TCRB has 12155
http://www.wikipathways.org/instance/WP288_r118398 MAPK has 651
 ==> expected: <0> but was: <4>
Chris-Evelo commented 1 year ago

OK, I do understand now, and of course, we want them to be resolved. So is that an identifiers.org issue? n2t.net actually resolves the CURIE. I just tried n2t.net/hgnc:septin1 and that resolves fine. Of course, we could also solve it by converting any hgnc symbols to hgnc IDs on the fly, before we do the linkout. That would still be inline with the philosophy of the human gene name consortium is that human users should not have to know or ever see the identifiers.

egonw commented 1 year ago

Step 2 was to test if we had non-numeric HGNC Accession numbers which we do not seem to have. Not many anyway: https://bit.ly/3H1QUhZ

egonw commented 1 year ago

We can try to fix it in the hackathon planned for Feb

@tabbassidaloii, great idea! It seems to me what we are missing in the BridgeDb mapping files are mappings between the HGNC symbols and HGNC Accession Numbers. Correct?

Now, to use any such work, we typically always need a new PathVisio release too, tho the webservices and therefore WikiPathways, but also the RDF generation can still use a recent BridgeDb directly.

tabbassidaloii commented 1 year ago

We can try to fix it in the hackathon planned for Feb

@tabbassidaloii, great idea! It seems to me what we are missing in the BridgeDb mapping files are mappings between the HGNC symbols and HGNC Accession Numbers. Correct?

I am not sure if that is correct. But I look into it.

egonw commented 1 year ago

Fixed. But I still like to see those mappings.