geneontology / go-releases

Tasks and notes for monthly GO releases
0 stars 0 forks source link

SynGO changes in 2024-04-24 snapshot #85

Closed pgaudet closed 2 months ago

pgaudet commented 2 months ago

Hi @dustine32

There are about 1000 new SynGO annotations on snapshot compared to 'release' (2024-04-18/). The current release already has the new data; did they submit something else?

Looking for differences, I noticed that amigo-staging has SynGO annotations to non-reviewed UniProt entries such as H0UVT3 and A0A452E1P4.

My question is: did we do a new load, or did we apply different checks/filers/mappings? Or could we be getting this data via GOA?

Note that I dont find any annotations to A0A452E1P4 in the source file here https://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human.gpa.gz, nor in our copy of upstream products here: http://snapshot.geneontology.org/products/upstream_and_raw_data/goa_human-src.gaf.gz

It seems we should not be including these annotations.

@kltm

kltm commented 2 months ago

Noting that nothing has changed in https://github.com/geneontology/syngo-go-cams; as well, there have been no software changes that I remember. Digging in.

dustine32 commented 2 months ago

The bulk (945) of these ~1000 annotations are for rat. The sudden introduction of these newer SynGO annotation is due to RGD adding them to their GAF file, either through directly grabbing our noctua_rgd.gpad or indirectly through the GOA release. Since the noctua_rgd.gpad YAML does NOT have a merges_into: rgd line, it looks like this change didn’t occur in the GO pipeline. You can see the increase in the input upstream rgd-src.gaf file:

$ awk 'BEGIN {FS="\t"}; $15 == "SynGO"' rgd-src-rel-Apr.gaf | wc -l
    3440
$ awk 'BEGIN {FS="\t"}; $15 == "SynGO"' rgd-src-snapshot.gaf | wc -l
    4385

So, RGD effectively controls when the rat SynGO Noctua annotations get into the GO release, not us (GO)?

The remaining (including example IDs H0UVT3 and A0A452E1P4) are for non-regular organisms indeed coming from our SynGO Noctua load and are in our noctua_uniprotkb.gpad file. But, similar to the above rat situation, we (GO) don't actually control when these are introduced to the GO release because there is no noctua_uniprotkb.gpad merges_into: metadata in the datasets YAMLs. These annotations have only recently appeared in the GO pipeline products (e.g., amigo-staging) because GOA loads our noctua_*.gpad export files and incorporates them into their goa_uniprot_all file, which was updated with the 2024-04-19 GOA release.

pgaudet commented 2 months ago

Thanks @dustine32 !