FlyBase / GO-curation

For projects related to GO curation in FlyBase
MIT License
0 stars 0 forks source link

Issues with old InterPro2GOs #70

Closed hattrill closed 8 months ago

hattrill commented 1 year ago

From Steven: 10.03.23

I just noticed this apparent redundancy where the ‘evidence’ for an InterProGO annotation is shown once on its own and again coupled with a second one e.g. https://flybase.org/reports/FBgn0265262#function https://flybase.org/reports/FBgn0030052#function

Not sure if there’s anything we can do about this? It’s not exactly wrong, but it gives a false impression of the weight of InterPro2GO evidence (and takes up space!)

hattrill commented 1 year ago

From GAF from FB2023_01 multiple gene-GO ID for InterPro2GO https://docs.google.com/spreadsheets/d/1bmkCJ9HLZc28914_5dJvwcp92OIsqIoCXusKDzrygGE/edit#gid=0

400 instances (247 genes)

Looks like where there are different UniProtKB IDs : FBgn and those forms have different InterPro mappings due to isoform seq variation then pipeilne adds Interpro lines as diff evidence lines.

Probably need to alter pipeline to select longest string. But must check this further to make sure this is the correct interpretation of the observation.

hattrill commented 1 year ago

@sjm41 started looking at the issue you flagged up.

hattrill commented 1 year ago

Gss2; FBgn0052495 GO:0004363 x4 in GAF

In Protein2GO FBgn0052495 Q9VX03 InterPro:IPR005615|InterPro:IPR014042 FBgn0052495 Q1RL06 InterPro:IPR005615|InterPro:IPR014049 FBgn0052495 Q86B44 InterPro:IPR004887|InterPro:IPR005615|InterPro:IPR014042|InterPro:IPR037013 FBgn0052495 Q8IQZ1 InterPro:IPR004887|InterPro:IPR005615|InterPro:IPR014042|InterPro:IPR014049|InterPro:IPR037013 FBgn0052495 X2JCE7 InterPro:IPR005615|InterPro:IPR01404

On page:

Screenshot 2023-03-13 at 17 59 08

FBgn0039478 GO:0006508 x3 in GAF

In Protein2GO: FBgn0039478 Q8IHC6 InterPro:IPR000718|InterPro:IPR008753 FBgn0039478 Q95SM2 InterPro:IPR000718|InterPro:IPR018497 & InterPro:IPR000718|InterPro:IPR018497 FBgn0039478 Q8IMQ2 InterPro:IPR000718|InterPro:IPR008753|InterPro:IPR018497

On page:

Screenshot 2023-03-13 at 17 52 35
hattrill commented 10 months ago

Ticket: https://flybase.atlassian.net/browse/DB-876

hattrill commented 10 months ago

For FB_2023_05 P2GO load there are a lot of "old" dated InterPro annotations mainly due to addition of do not annotate labels to BP terms slated for obsoletion. This makes a manual clean up via proforma too hard. Checked pseudoenzymes against old InterPro2GO MFs and no overlap. IEAs.xlsx

need to get this fixed at load in FB2023_06

hattrill commented 8 months ago

closing as this is on jira now