geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

GORULE_0000001 : extension errors cause lines to be filtered - we'd like to change that behavior #2334

Closed hattrill closed 4 months ago

hattrill commented 6 months ago

When there is an issue with an extension e.g. like I have caused by using the ID instead of the relation name in this annotations:

ERROR - Syntax error in annotation extension field:extensions should be relation(curie) and relation should have corresponding URI: FB FBgn0034841 atlas is_active_in GO:0005634 FB:FBrf0251275|PMID:34478447 IDA C atlas CG13541|atl protein taxon:7227 20211012 FlyBase BFO:0000050(CL:0000019)

Could you "fix it" by stripping out the extension rather than dropping the whole annotation, as that bit of info is still valid (whilst still flagging it up)

(ps I will fix this particular issue for future releases)

pgaudet commented 5 months ago

My inclination would be to make this a WARNING rather than a FILTER, and leave the extensions as-is. Other option would be as @hattrill suggests, to filter out the information in the extensions (but then we may end up with redundant annotations. Not sure that's a problematic issue, but it would be confusing.)

mugitty commented 5 months ago

@pgaudet and @kltm, I looked at the code and we cannot leave the extension as-is. I could remove the extension, output a warning and not filter the annotation. However, this may lead to redundant annotations as @pgaudet suggested.

kltm commented 5 months ago

@mugitty Just to clarify, what is the destructive part of the code that doesn't allow the original to pass through?

mugitty commented 5 months ago

@kltm, the extension string is parsed into a conjunctive set

mugitty commented 5 months ago

@kltm, do you want me to implement a temporary fix to remove the extension if it is invalid? We can create a ticket to add the check once @hattrill updates the extensions

kltm commented 5 months ago

@mugitty I think this may be a @pgaudet question. Above (https://github.com/geneontology/go-site/issues/2334#issuecomment-2161342170), there is a desire to leave this as a WARNING and pass it through. I recall, however, that that was not possible due to reconstructing the annotation from the internal model?

mugitty commented 5 months ago

@kltm, you are right. We cannot keep the 'problem extension' and flag as a warning. We will have to remove the problem extension and flag as a warning. This may result in redundant annotations as @pgaudet suggested.

kltm commented 5 months ago

@mugitty I think we need @pgaudet here: this is policy rather than mechanism. I would point out that the question of whether or not there would actually be redundant annotations created by striping is (as far as I know) a possibility, rather than a certainty. If we were wanting to make more changes, a literal copy of the initial string in the column could be kept and used if this condition was hit. That said, I'm not sure that this is a priority, as there is a fix coming from upstream anyways and this was purely meant to bridge the upstream data error.

hattrill commented 5 months ago

Would be nice to have a fix here - I think that the number of redundancies from stripping out bad extensions would probably be very small and won't be a problem for enrichment, etc. Where as the absence of annotations is potentially problematic for us e.g. we can't use them in Noctua.

We should have a new release out today so the GAF will be updated on our ftpsite. However, I fear this will take a long time to filter through as there has just been a GO release - unless there is a way to 'inject' the new set indep. of release.

kltm commented 5 months ago

@hattrill Do you pick up the GAFs for further use at your end? Otherwise, as we generally direct people to the "releases", whether it's fixed in your data or patched over in software, we'd still be looking at the next release for things to get to downloads and AmiGO.

hattrill commented 5 months ago

Hi @kltm we don't pick up our GAFs from GO. It would have just been good for the fix to have happened automatically rather than having to wait for the next GO release for GO to pick up our fixed GAF.

Other fixes happen automatically and there is no check for potential redundancy issues there.