Closed hattrill closed 8 months ago
From GAF from FB2023_01 multiple gene-GO ID for InterPro2GO https://docs.google.com/spreadsheets/d/1bmkCJ9HLZc28914_5dJvwcp92OIsqIoCXusKDzrygGE/edit#gid=0
400 instances (247 genes)
Looks like where there are different UniProtKB IDs : FBgn and those forms have different InterPro mappings due to isoform seq variation then pipeilne adds Interpro lines as diff evidence lines.
Probably need to alter pipeline to select longest string. But must check this further to make sure this is the correct interpretation of the observation.
@sjm41 started looking at the issue you flagged up.
Gss2; FBgn0052495 GO:0004363 x4 in GAF
In Protein2GO FBgn0052495 Q9VX03 InterPro:IPR005615|InterPro:IPR014042 FBgn0052495 Q1RL06 InterPro:IPR005615|InterPro:IPR014049 FBgn0052495 Q86B44 InterPro:IPR004887|InterPro:IPR005615|InterPro:IPR014042|InterPro:IPR037013 FBgn0052495 Q8IQZ1 InterPro:IPR004887|InterPro:IPR005615|InterPro:IPR014042|InterPro:IPR014049|InterPro:IPR037013 FBgn0052495 X2JCE7 InterPro:IPR005615|InterPro:IPR01404
On page:
FBgn0039478 GO:0006508 x3 in GAF
In Protein2GO: FBgn0039478 Q8IHC6 InterPro:IPR000718|InterPro:IPR008753 FBgn0039478 Q95SM2 InterPro:IPR000718|InterPro:IPR018497 & InterPro:IPR000718|InterPro:IPR018497 FBgn0039478 Q8IMQ2 InterPro:IPR000718|InterPro:IPR008753|InterPro:IPR018497
On page:
For FB_2023_05 P2GO load there are a lot of "old" dated InterPro annotations mainly due to addition of do not annotate labels to BP terms slated for obsoletion. This makes a manual clean up via proforma too hard. Checked pseudoenzymes against old InterPro2GO MFs and no overlap. IEAs.xlsx
need to get this fixed at load in FB2023_06
closing as this is on jira now
From Steven: 10.03.23
I just noticed this apparent redundancy where the ‘evidence’ for an InterProGO annotation is shown once on its own and again coupled with a second one e.g. https://flybase.org/reports/FBgn0265262#function https://flybase.org/reports/FBgn0030052#function
Not sure if there’s anything we can do about this? It’s not exactly wrong, but it gives a false impression of the weight of InterPro2GO evidence (and takes up space!)