FlyBase / GO-curation

For projects related to GO curation in FlyBase
MIT License
0 stars 0 forks source link

curation via Noctua -> import into P2GO #76

Closed hattrill closed 12 months ago

hattrill commented 1 year ago

Ticket for changes that need to be made for using Noctua to curate and annotations being imported via P2GO

hattrill commented 1 year ago
hattrill commented 1 year ago

gp_information.fb file that we use for P2GO look up is ok for Noctua entities. Make available to GOC pipeline via ftpsite:> https://flybase.atlassian.net/browse/WEB-2138

hattrill commented 1 year ago

Alex has done first import: Updated records for: P08646 Q24326 Q7KV88 Q8SZT4 Q95V09 A0A1Z1CN92 O96660 P07713 Q7YU60 Q9TZQ2 M9PBJ7 Q5U110 Q7JQ37 Q7KV89 Q9XYQ9 A8DZ02 M9PH19 P42003 Q86NL2 Q94902 Q961B0 Q9UAC4 P54367 Q23975 Q24080 Q24229 Q7KTP1 Q8IPK9 Q01083 Q95SI0 Q9NHE9 A0A0B4KG59 A1Z7L9 A8DZ03 Q6AWJ8 Q7KV90 A0A6H2EEY2 O15968 P51023 Q24468 Q27933 Q59DZ3 Q95T01 Q9VVX3

These have the source NFB. Athough it screens out duplicated annotations., the pipeline propagates to all mapped UniProtKB IDs and as we’ve been annotating to the GCRP in P2GO, these other ‘isoforms’ are populated as well. Screenshot 2023-05-26 at 09 06 49

two ways of addressing this (not mutually exclusive)

  1. Only map to GCRP set (but this may be difficult or diverge from others.
  2. Ranking FB over NFB annotation

as we deal with this issue of multiple mappings from original import, might be best left as is.

hattrill commented 1 year ago

Error file: have fixed issues in Noctua that came from just an FBfr being usied or an out-of-date Fbgn Line 18: ERROR Unsupported / missing reference [FB:FBrf0226456] 18> FB FBgn0003371 is_active_in GO:0005829 FB:FBrf0226456 ECO:0007005 20210615 FlyBase noctua-model-id=gomodel:6086f4f200000223|model-state=production|contributor=https://orcid.org/0000-0003-3212-6364 Line 30: ERROR Unsupported / missing reference [FB:FBrf0174958] 30> FB FBgn0003371 involved_in GO:0090090 FB:FBrf0174958 ECO:0000315 20210615 FlyBase noctua-model-id=gomodel:6086f4f200000223|model-state=production|contributor=https://orcid.org/0000-0003-3212-6364 Line 36: ERROR Unsupported / missing reference [FB:FBrf0230336] 36> FB FBgn0026598 involved_in GO:0090090 FB:FBrf0230336 ECO:0000315 20210615 FlyBase noctua-model-id=gomodel:6086f4f200000223|model-state=production|contributor=https://orcid.org/0000-0003-3212-6364 Line 38: ERROR Unsupported / missing reference [FB:FBrf0246970] 38> FB FBgn0016917 is_active_in GO:0005634 FB:FBrf0246970 ECO:0000314 20210527 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000110 Line 39: ERROR Unsupported / missing reference [FB:FBrf0204740] 39> FB FBgn0043903 is_active_in GO:0016324 FB:FBrf0204740 ECO:0000314 20210527 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000110 Line 47: ERROR Unsupported / missing reference [FB:FBrf0151938] 47> FB FBgn0043903 is_active_in GO:0016324 FB:FBrf0151938 ECO:0000314 20210527 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000110 Line 48: ERROR Unsupported / missing reference [FB:FBrf0246970] 48> FB FBgn0016917 involved_in GO:0007259 FB:FBrf0246970 ECO:0000314 20210527 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000110 Line 49: ERROR Unsupported / missing reference [FB:FBrf0204740] 49> FB FBgn0004864 is_active_in GO:0009898 FB:FBrf0204740 ECO:0000314 20210527 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000110 Line 51: ERROR Unsupported / missing reference [FB:FBrf0162076] 51> FB FBgn0000490 involved_in GO:0030509 FB:FBrf0162076 ECO:0000314 20210527 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000189 Line 55: ERROR Unsupported / unmapped identifier [FB:FBgn0011655] 55> FB FBgn0011655 involved_in GO:0030509 PMID:20010841 ECO:0000314 20210615 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000189 Line 56: ERROR Unsupported / missing reference [FB:FBrf0064404] 56> FB FBgn0003169 is_active_in GO:0005886 FB:FBrf0064404 ECO:0000255 20210527 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000189 Line 60: ERROR Unsupported / missing reference [FB:FBrf0074149] 60> FB FBgn0003716 is_active_in GO:0005886 FB:FBrf0074149 ECO:0000314 20210527 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000189 Line 64: ERROR Unsupported / unmapped identifier [FB:FBgn0011655] 64> FB FBgn0011655 enables GO:0001228 PMID:20010841 ECO:0000314 20210615 FlyBase part_of(GO:0030509) model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000189 Line 66: ERROR Unsupported / unmapped identifier [FB:FBgn0011655] 66> FB FBgn0011655 is_active_in GO:0005634 PMID:9502724 ECO:0000353 FB:FBgn0011648 20210615 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000189 Line 69: ERROR Unsupported / missing reference [FB:FBrf0051545] 69> FB FBgn0000490 is_active_in GO:0005615 FB:FBrf0051545 ECO:0000314 20210527 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700000189 Line 73: ERROR Unsupported / unmapped identifier [FB:FBgn0011655] 73> FB FBgn0011655 is_active_in GO:0005634 PMID:9502733 ECO:0000314 20210528 FlyBase noctua-model-id=gomodel:60ad85f700000259|model-state=production|contributor=https://orcid.org/0000-0003-3212-6364 Line 79: ERROR Unsupported / unmapped identifier [FB:FBgn0011655] 79> FB FBgn0011655 involved_in GO:0032924 PMID:10320478 ECO:0000314 20210528 FlyBase noctua-model-id=gomodel:60ad85f700000259|model-state=production|contributor=https://orcid.org/0000-0003-3212-6364 Line 81: ERROR Unsupported / unmapped identifier [FB:FBgn0011655] 81> FB FBgn0011655 enables GO:0001228 PMID:20010841 ECO:0000314 20210528 FlyBase noctua-model-id=gomodel:60ad85f700000259|model-state=production|contributor=https://orcid.org/0000-0003-3212-6364 Line 87: ERROR Unsupported / missing reference [FB:FBrf0184061] 87> FB FBgn0284421 involved_in GO:0007219 FB:FBrf0184061 ECO:0000314 20210528 FlyBase model-state=production|noctua-model-id=gomodel:60ad85f700000309|contributor=https://orcid.org/0000-0003-3212-6364 Line 106: ERROR Unsupported / missing reference [FB:FBrf0082580] 106> FB FBgn0005672 is_active_in GO:0005615 FB:FBrf0082580 ECO:0000314 20210611 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700001873 Line 108: ERROR Unsupported / missing reference [FB:FBrf0194920] 108> FB FBgn0001965 is_active_in GO:0031234 FB:FBrf0194920 ECO:0000314 20210612 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60ad85f700001873 Line 158: ERROR Unsupported / missing reference [FB:FBrf0151872] 158> FB FBgn0010909 enables GO:0042656 FB:FBrf0151872 ECO:0000304 20210628 FlyBase model-state=production|contributor=https://orcid.org/0000-0003-3212-6364|noctua-model-id=gomodel:60d5209a00000233

SUMMARY

Number of lines processed: 239754 Total number of annotations: 239744 Number of annotations assigned by FlyBase: 149 Number of annotations assigned by other sources: 239595 Total number of annotations excluded: 22 Number of annotations with error "Unsupported / missing reference": 16 Number of annotations with error "Unsupported / unmapped identifier": 6 Total number of warnings: 0 Number of annotations with no errors: 127 Number of annotations output: 209

hattrill commented 1 year ago

New list with post-load non-GCRP clean up: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

From | To -- | -- P08646 | FBgn0003205 Q7KV88 | FBgn0010909 A0A1Z1CN92 | FBgn0026598 O96660 | FBgn0025800 P07713 | FBgn0000490 Q7YU60 | FBgn0011300 M9PBJ7 | FBgn0010909 Q5U110 | FBgn0020493 Q7KV89 | FBgn0010909 A8DZ02 | FBgn0259984 M9PH19 | FBgn0010909 P42003 | FBgn0011648 P54367 | FBgn0015024 Q7KTP1 | FBgn0003716 Q8IPK9 | FBgn0003716 Q01083 | FBgn0005672 Q95SI0 | FBgn0003716 A0A0B4KG59 | FBgn0003169 A1Z7L9 | FBgn0011300 A8DZ03 | FBgn0259984 Q7KV90 | FBgn0010909 A0A6H2EEY2 | FBgn0020493 P51023 | FBgn0003118 Q24468 | FBgn0003169 Q59DZ3 | FBgn0259984 Q9VVX3 | FBgn0016797

hattrill commented 1 year ago

Issue with mapping to GCRP. Alex to look into it.

hattrill commented 1 year ago

From Alex: "Using what you suggested and after discussion with UniProt production team and Pascale, I've adjured the filter for following logic:

  1. Keep annotations to GCRP canonical, regardless if it's part of Swiss-Prot or TrEMBL set.

  2. Keep annotations to GCRP isoform only if it part of Swiss-Prot set.

Now I'm getting consistent, but much smaller set of annotations regardless of mapping file I use. See list of UniProt accessions bellow.

A0A0B4KG59 A0A6H2EEY2 O96660 P07713 P08646 P42003 P51023 P54367 Q01083 Q9VVX3"

Checked all and all are part of GCRP:

A0A0B4KG59 FBgn0003169 A0A6H2EEY2 FBgn0020493 O96660 FBgn0025800 P07713 FBgn0000490 P08646 FBgn0003205 P42003 FBgn0011648 P51023 FBgn0003118 P54367 FBgn0015024 Q01083 FBgn0005672 Q9VVX3 FBgn0016797

hattrill commented 12 months ago

The annotations from Noctua are now in P2GO and are in GPAD (with just FlyBase as source)

hattrill commented 12 months ago

Example of only curated in Noctua for checking: PMID:37086384