Open kltm opened 1 year ago
Okay, I take that back: I've found the source in the wb.gpi:
bbop@wok:/home/skyhook/release/products/annotations$ zcat wb-src.gpi.gz | grep "F07B7.5"
WB WBGene00001923 his-49 HIStone CELE_F07B7.5 gene taxon:6239 UniProtKB:P08898
WB F07B7.5 his-49 HIStone CELE_F07B7.5 transcript taxon:6239 WB:WBGene00001923
WB CE03253 HIS-2 HIStone CELE_T10C6.13 protein taxon:6239 WB:T10C6.13|WB:F45F2.13|WB:ZK131.3|WB:ZK131.7|WB:K06C4.5|WB:ZK131.2|WB:K06C4.13|WB:F17E9.10|WB:K03A1.1|WB:F08G2.3|WB:B0035.10|WB:F07B7.5|WB:F54E12.1|WB:F55G1.2|WB:F22B3.2 UniProtKB:P08898|UniProtKB:K7ZUH9
This is ringing a bell; I'm going to dig around to see if I can find a previous instance of this.
Interesting. This didn't ring any bells, but there are WB sequence identifiers buried in that string and when I check a few of them, I see that they correspond to genes that produce the exact same protein.
Hm, it looks like we've asked similar questions in the past, and felt that it didn't matter much in the grand scheme of things https://github.com/geneontology/neo/issues/88#issuecomment-1093598908 (note the WB identifier).
Okay. The way the C. elegans protein identifiers are assigned in WB right now, we don't have unique protein ids for each gene if they ultimately produce a protein with the same amino acid sequence. If you think we need a better way of handling this, we can discuss some more.
@vanaukenk @pgaudet As we come up on a few months on this issue (and about a year since closing the variant https://github.com/geneontology/neo/issues/111), I was wondering if we're just documenting this (as we did previously with https://github.com/geneontology/neo/issues/88#issuecomment-1105607070) or if we're going to take the time to try and fix this this time around? I'm not sure how much of a problem this is in this case or if it's causing a problem that's valued as worth fixing right now?
In the most recent successful load, the following error was noticed going by:
Nothing like this seems to be in the WB GPI. In fact, no GPI seems to have this, so it may be coming from a parsed GAF? Weird. Before digging in more, does this ring any bells @vanaukenk ?