geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

Change how interacting taxon annotations in Noctua are exported to GPAD #545

Open vanaukenk opened 2 weeks ago

vanaukenk commented 2 weeks ago

This issue stems from development of the new Noctua standard annotation form. https://github.com/geneontology/noctua-standard-annotations/issues/24

In GO-CAMs and Noctua, interacting taxa are captured as 'has input' annotation extensions to the appropriate GO BP term. However, this results in an inconsistency with how the information is captured in other standard annotations.

To change the Noctua GPAD output, we need to instead populate Column 8 of the GPAD1.1 file with the numerical value of the interacting taxon id in the annotation extension.

Then, during the conversion to GAF2.2, the interacting taxon value will need to be pipe-separated in the taxon field.

http://noctua.geneontology.org/editor/graph/gomodel:MGI_MGI_2429397 is an example of a gene-centric GO-CAM model with interacting taxon annotations.

An example of a current output annotation in the MGI GPAD source file for this model is:

MGI MGI:2429397 acts_upstream_of_or_within GO:0050830 PMID:19139201 ECO:0000315 MGI:MGI:3529594 20210413 MGI has_input(taxon:282459) noctua-model-id=gomodel:MGI_MGI_2429397|model-state=production|contributor=https://orcid.org/0000-0001-9990-8331|contributor=https://orcid.org/0000-0002-9796-7693

And this makes its way to the current MGI GAF file as:

MGI MGI:1346060 Tlr2 acts_upstream_of_or_within GO:0050830 PMID:19139201 IMP MGI:MGI:2674036 P toll-like receptor 2 Ly105 protein_coding_gene taxon:10090 20210413 MGI has_input(taxon:282459)

The proposed, updated GPAD1.1 output from Noctua would instead be:

MGI MGI:2429397 acts_upstream_of_or_within GO:0050830 PMID:19139201 ECO:0000315 MGI:MGI:3529594 282459 20210413 MGI noctua-model-id=gomodel:MGI_MGI_2429397|model-state=production|contributor=https://orcid.org/0000-0001-9990-8331|contributor=https://orcid.org/0000-0002-9796-7693

And the resulting GAF2.2 file would be:

MGI MGI:1346060 Tlr2 acts_upstream_of_or_within GO:0050830 PMID:19139201 IMP MGI:MGI:2674036 P toll-like receptor 2 Ly105 protein_coding_gene taxon:10090|taxon:282459 20210413 MGI

The file specs I'm looking at for this are: https://geneontology.org/docs/gene-product-association-data-gpad-format/ https://geneontology.org/docs/go-annotation-file-gaf-format-2.2/

@kltm @balhoff - please let me know if this looks okay to you or if anything needs clarification/correction. Thanks.

kltm commented 2 weeks ago

@vanaukenk This looks about right to me. I'd make the small comment that I don't think there is a formal GPAD 1.1 standard that minerva follows. The step that @balhoff is concerned with is to make sure that the taxon info goes to column, not extension. After that, it will be up to ontobio (more the wheelhouse of @dustine32 and @mugitty these days) to make any changes to make sure the conversion from GPAD to GAF 2.2 gets fixed if there are issues. It may work out of the box, which we can check on once the changes in minerva get out through a snapshot.