geneontology / go-releases

Tasks and notes for monthly GO releases
0 stars 0 forks source link

Invalid reference format for 2464 references #46

Closed pgaudet closed 2 months ago

pgaudet commented 10 months ago

(first September2023 release attempt)

Hi @kltm @dustine32 @sierra-moxon @vanaukenk

In the release stats http://skyhook.berkeleybop.org/release/release_stats/go-annotation-changes.tsv we have 2464 references with the format

PMID:PMCnnnn (for example, PMID:PMC4642926).

I dont think this is valid??

These seems to be coming form the WB file - http://skyhook.berkeleybop.org/release/products/upstream_and_raw_data/wb-src.gaf.gz - this file has > 45,000 instances of "PMID:PMC". I didn't check other files.

see https://amigo-staging.geneontology.io/amigo/gene_product/WB:WBGene00000030 versus the current version https://amigo.geneontology.org/amigo/gene_product/WB:WBGene00000030

I think this is blocking? AmiGO cannot resolve the links. And it seems that to use 'PMC' we'd need a different prefix - links look like this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642926/

Thanks, Pascale

kltm commented 10 months ago

@pgaudet I think that there are two separate threads here

  1. Is the data intended to have this ID; is the ID a correctly constructed CURIE?
  2. Make AmiGO resolve them.

For "2", AmiGO checks to make sure that the IDs are numeric. If that is no longer that case and there is no internal id dependent routing, fixing this in AmiGO would be trivial.

I think the first thing we want to do is clarify "1" and make sure there is intent and correctness for these IDs.

pgaudet commented 10 months ago

the data is not correct, the prefix should not be "PMID."

kltm commented 10 months ago

the data is not correct, the prefix should not be "PMID."

Understood; cancelling release.

vanaukenk commented 10 months ago

I'm looking into this and trying to track down the reason for the incorrect reference ids in our GAF.

vanaukenk commented 10 months ago

@kltm @pgaudet

WormBase has generated a new GAF file. As soon as it's on our ftp site, I'll let you know so we can try the release again. Thanks for letting us know about this.

pgaudet commented 10 months ago

Thanks Kimberly, this is really appreciated. Any rough idea when this will be available? Days, weeks... ? right now this is blocking the release.

As an aside, is the WB repo private? The link does not work for me.

vanaukenk commented 10 months ago

@pgaudet - this should be a matter of hours or a day or two at most. I've let the WB team know this is holding up the GO release.

Sorry about the unproductive link to the WB repo - I forget that it's private and will remove it.

pgaudet commented 9 months ago

As an interim workaround, we will have @vanaukenk send us the best version of the file and will use it "manually" until the upstream is fixed. Once the upstream is fixed, we will revert, see https://github.com/geneontology/pipeline/issues/338

pgaudet commented 2 months ago

Fixed, we have now switched back to the wormbase file as the source since they have fixed the problem in the file they generate, see https://github.com/geneontology/go-releases/issues/51