Closed sierra-moxon closed 5 months ago
Noting that the human and rat orthology loads use the same code that sets the provided_by to "GO_Central" in the preprocessing pipeline. And, when I look at the GAF file for the human and rat outputs of the preprocessing pipeline here: http://skyhook.berkeleybop.org/silver-issue-325-gopreprocess/products/upstream_and_raw_data/preprocessed_GAF_output/mgi-human-ortho.gaf and http://skyhook.berkeleybop.org/silver-issue-325-gopreprocess/products/upstream_and_raw_data/preprocessed_GAF_output/mgi-rgd-ortho.gaf
they both show ONLY GO_Central as the provider (as expected).
SMoxon@SMoxon-M82 gopreprocess % grep "MGI:2179523" mgi-merged.gaf.2 | grep "GO_REF:0000096"
MGI MGI:2179523 Fcgr4 involved_in GO:0071222 GO_REF:0000096 ISO RGD:1303067 P Low affinity immunoglobulin gamma Fc region receptor III-A protein taxon:10090 20240318 MGI
MGI MGI:2179523 Fcgr4 located_in GO:0009986 GO_REF:0000096 ISO RGD:1303067 C Low affinity immunoglobulin gamma Fc region receptor III-A protein taxon:10090 20240318 MGI
MGI MGI:2179523 Fcgr4 involved_in GO:0071222 GO_REF:0000096 ISO RGD:1303067 P Fc receptor, IgG, low affinity IV gene_product taxon:10090 20240318 GO_Central
MGI MGI:2179523 Fcgr4 located_in GO:0009986 GO_REF:0000096 ISO RGD:1303067 C Fc receptor, IgG, low affinity IV gene_product taxon:10090 20240318 GO_Central
SMoxon@SMoxon-M82 gopreprocess %
this seems to be the issue:
SMoxon@SMoxon-M82 GAF_OUTPUT % grep "MGI:2179523" *.gaf | grep "GO_REF:0000096"
mgi-p2g-converted.gaf:MGI MGI:2179523 Fcgr4 involved_in GO:0071222 GO_REF:0000096 ISO RGD:1303067 P Low affinity immunoglobulin gamma Fc region receptor III-A protein taxon:10090 20240318 MGI
mgi-p2g-converted.gaf:MGI MGI:2179523 Fcgr4 located_in GO:0009986 GO_REF:0000096 ISO RGD:1303067 C Low affinity immunoglobulin gamma Fc region receptor III-A protein taxon:10090 20240318 MGI
mgi-rgd-ortho.gaf:MGI MGI:2179523 Fcgr4 involved_in GO:0071222 GO_REF:0000096 ISO RGD:1303067 P Fc receptor, IgG, low affinity IV gene_product taxon:10090 20240318 GO_Central
mgi-rgd-ortho.gaf:MGI MGI:2179523 Fcgr4 located_in GO:0009986 GO_REF:0000096 ISO RGD:1303067 C Fc receptor, IgG, low affinity IV gene_product taxon:10090 20240318 GO_Central
in the rat ortho load, we get the annotations with the correct provided by. in the protein-to-go load, we get the same annotations, but the requirements for that load are to keep the provided_by the same as what came in via protein to go.
here are the two RGD annotations in the goa_mouse file:
UniProtKB A0A0B4J1G0 Fcgr4 involved_in GO:0071222 GO_REF:0000096 ISO RGD:1303067 P Low affinity immunoglobulin gamma Fc region receptor III-A Fcgr4|Fcgr3a protein taxon:10090 20111011 MGI
UniProtKB A0A0B4J1G0 Fcgr4 located_in GO:0009986 GO_REF:0000096 ISO RGD:1303067 C Low affinity immunoglobulin gamma Fc region receptor III-A Fcgr4|Fcgr3a protein taxon:10090 20111011 MGI
I convert the date, and I swap the identifiers per requirements in the GOA load, but I don't move the provided_by.
Should these two Protein to GO annotations be removed via some constraint from the final GAF file? @LiNiMGI
I imagine that these are coming in now, with the loosened constraint on the Protein to GO load, where we wanted things annotated by MGI, GO_Central, or GOC as long as they didn't have this reference: "GO_REF:0000033"
Did I misinterpret that new constraint? Should I always exclude MGI provided annotations in the protein to GO conversion and only keep those provided by GO_Central or GOC when they don't have this reference: "GO_REF:0000033" ?
https://github.com/geneontology/gopreprocess/pull/60 <-- I tightened the constraint to ignore any annotation from protein to GO provided_by MGI in the import/conversion, but if the annotation is from GO_Central or GOC, then check for the "GO_REF:0000033" and only bring in those from GOC or GO_Central that do not have this reference.
@sierra-moxon only bring in those "assign by" GO_Central that do not have "GO_REF:0000033" reference. MGI should still be exclude as before. Thanks!
Li has confirmed that these are fixed in the lastest round of tests.
Thanks!!! @sierra-moxon I did a quick check:
GO_REF:0000096 J:155856 Rat to mouse Date fixed, assign by still not fixed (change MGI to GO_Central) GO_REF:0000119.