Closed pgaudet closed 2 months ago
Noting that nothing has changed in https://github.com/geneontology/syngo-go-cams; as well, there have been no software changes that I remember. Digging in.
The bulk (945) of these ~1000 annotations are for rat. The sudden introduction of these newer SynGO annotation is due to RGD adding them to their GAF file, either through directly grabbing our noctua_rgd.gpad
or indirectly through the GOA release. Since the noctua_rgd.gpad
YAML does NOT have a merges_into: rgd
line, it looks like this change didn’t occur in the GO pipeline. You can see the increase in the input upstream rgd-src.gaf
file:
$ awk 'BEGIN {FS="\t"}; $15 == "SynGO"' rgd-src-rel-Apr.gaf | wc -l
3440
$ awk 'BEGIN {FS="\t"}; $15 == "SynGO"' rgd-src-snapshot.gaf | wc -l
4385
So, RGD effectively controls when the rat SynGO Noctua annotations get into the GO release, not us (GO)?
The remaining (including example IDs H0UVT3 and A0A452E1P4) are for non-regular organisms indeed coming from our SynGO Noctua load and are in our noctua_uniprotkb.gpad
file. But, similar to the above rat situation, we (GO) don't actually control when these are introduced to the GO release because there is no noctua_uniprotkb.gpad merges_into:
metadata in the datasets YAMLs. These annotations have only recently appeared in the GO pipeline products (e.g., amigo-staging) because GOA loads our noctua_*.gpad
export files and incorporates them into their goa_uniprot_all file, which was updated with the 2024-04-19 GOA release.
Thanks @dustine32 !
Hi @dustine32
There are about 1000 new SynGO annotations on snapshot compared to 'release' (2024-04-18/). The current release already has the new data; did they submit something else?
Looking for differences, I noticed that amigo-staging has SynGO annotations to non-reviewed UniProt entries such as H0UVT3 and A0A452E1P4.
My question is: did we do a new load, or did we apply different checks/filers/mappings? Or could we be getting this data via GOA?
Note that I dont find any annotations to A0A452E1P4 in the source file here https://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human.gpa.gz, nor in our copy of upstream products here: http://snapshot.geneontology.org/products/upstream_and_raw_data/goa_human-src.gaf.gz
It seems we should not be including these annotations.
@kltm