geneontology / gocamgen

Base repo for constructing GO-CAM model RDF
0 stars 0 forks source link

Filtering MGI: Allow ISO if Assigned_by=MGI #37

Closed dustine32 closed 2 months ago

dustine32 commented 5 years ago

Currently for MGI, we're filtering out all ISO annotation lines from GPAD. We now want to allow ISO lines to be translated in model if that ISO line has Assigned_by=MGI.

This shouldn't affect other MOD filters. Yet.

From David's email:

The ones to be filtered would be ones with the references:

GO_REF:0000096, MGI: 4834177 GO_REF:0000096; MGI: 4417868

vanaukenk commented 5 years ago

Information about filtering rules is available in yaml files here: https://github.com/geneontology/gocamgen/tree/master/metadata/filter_rules

dustine32 commented 4 years ago

@ukemi Going through my GO meeting slide on filtering, I realized this issue can be solved by simply removing ISO from the list of unwanted_evidence_codes in filter_rules/mgi.yaml.

unwanted_evidence_codes:
 - IEA
 - IBA
 - ISO  # Remove

This will allow all ISO lines into the filter but, since we're also filtering for Assigned_by=MGI, all ISO lines translated will be from MGI. No special logic required.

dustine32 commented 4 years ago

@ukemi I have some before and after models to inspect the change in PR #67 using MGI:MGI:1920971. "Before" has 4 annotations, "after" has 6 (2 ISO's from MGI).

Let me know if you want me to load any other models to check this out. This change should still filter out ISO's from non-Assigned_by==MGI GPAD lines since the Assigned_by==MOD requirement is still enforced.

ukemi commented 4 years ago

Hi @dustine32. I don't think this is working correctly. The annotation that is added says its source is UniProtKB in our editorial interface. This one should be filtered out. As a test, let's try MGI:1920081. It should have one annotation made by Dmitry to GO:0003725 that comes through. All th other ones should be filtered out because they come from UniProtKB.

ukemi commented 4 years ago

Below is a snippet of our GPAD file. It looks like this might be on our end. Note that all of the ECO:0000266 annotations say they come from MGI, but in our interface only the one that is bolded is source at MGI. Let's try a different approach. Let's filter only the ones that have a reference of MGI:MGI:4834177|GO_REF:0000096. All the ones generated by MGI curators should have a different reference. Pinging @hdrabkin to have a look at the MGI gpad file and correct the source.

MGI MGI:1920081 enables GO:0000166 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0547 20191103 UniProt
MGI MGI:1920081 enables GO:0003676 MGI:MGI:2152098|GO_REF:0000002 ECO:0000501 InterPro:IPR011545 20191103 UniProt
MGI MGI:1920081 enables GO:0003682 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q7L2E3 20110720 MGI
MGI MGI:1920081 enables GO:0003723 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q7L2E3 20180615 MGI
MGI MGI:1920081 enables GO:0003723 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000433338 20170228 GO_Central
MGI MGI:1920081 enables GO:0003724 MGI:MGI:5616495|PMID:25219788 ECO:0000314 20180731 UniProt
MGI MGI:1920081 enables GO:0003724 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q7L2E3 20180615 MGI
MGI MGI:1920081 enables GO:0003725 MGI:MGI:4947923|PMID:21266579 ECO:0000266 UniProtKB:Q7L2E3 20131008 MGI
MGI MGI:1920081 enables GO:0004386 MGI:MGI:2152098|GO_REF:0000002 ECO:0000501 InterPro:IPR007502 20191103 UniProt
MGI MGI:1920081 enables GO:0004386 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0347 20191103 UniProt
MGI MGI:1920081 enables GO:0005524 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0067 20191103 UniProt
MGI MGI:1920081 enables GO:0005524 MGI:MGI:2152098|GO_REF:0000002 ECO:0000501 InterPro:IPR011545 20191103 UniProt
MGI MGI:1920081 part_of GO:0005622 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN001449167 20180328 GO_Central
MGI MGI:1920081 part_of GO:0005737 MGI:MGI:5616495|PMID:25219788 ECO:0000314 20180620 UniProt
MGI MGI:1920081 part_of GO:0005737 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q7L2E3 20180615 MGI
MGI MGI:1920081 part_of GO:0005739 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q7L2E3 20101115 MGI
MGI MGI:1920081 part_of GO:0005829 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q7L2E3 20101115 MGI
MGI MGI:1920081 acts_upstream_of_or_within GO:0007417 MGI:MGI:5616495|PMID:25219788 ECO:0000314 20180620 UniProt
MGI MGI:1920081 enables GO:0016787 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0378 20191103 UniProt
MGI MGI:1920081 part_of GO:0035770 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q7L2E3 20150718 MGI
MGI MGI:1920081 acts_upstream_of_or_within GO:0042254 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0690 20191103 UniProt
MGI MGI:1920081 part_of GO:0042645 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q7L2E3 20090218 MGI
MGI MGI:1920081 acts_upstream_of_or_within GO:1902775 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q7L2E3 20150718

ukemi commented 4 years ago

Note we should also filter out the rat load. MGI:MGI:4417868|GO_REF:0000096

@hdrabkin, are there any others I am forgetting?

ukemi commented 4 years ago

Actually as I think of it, the MGI assignment of these might be correct since we are the keepers of the orthology assignments. @hdrabkin, is this correct?

dustine32 commented 4 years ago

@ukemi OK, I can pretty easily code this type of filtering. But let me know after talking to @hdrabkin whether these should actually be filtered out.

If yes, I'll make a configurable parameter in the mgi.yaml file that filters out any lines containing MGI:MGI:4834177 and GO_REF:0000096 or MGI:MGI:4417868 and GO_REF:0000096.

dustine32 commented 4 years ago

@ukemi Actually, I might be able to get away with using the existing unwanted_evi_code_ref_combos parameter if I just select for ISO and GO_REF:0000096. Checking all GO_REF:0000096 lines in the latest mgi.gpa file, this reference is only paired with either MGI:MGI:4417868 or MGI:MGI:4834177:

$ grep GO_REF:0000096 mgi.gpa | cut -f5 | cut -d "|" -f1 | sort | uniq -c
35836 MGI:MGI:4417868
81154 MGI:MGI:4834177

I'll update the PR and load the MGI:MGI:1920081 model for testing.

dustine32 commented 4 years ago

@ukemi Now it looks better! It only translated the one bolded line for the MGI:MGI:1920081 model. image

The PR is updated and ready for re-review.

ukemi commented 4 years ago

Very weird, I can't seem to get to the model. I am probably blocked from the server. But from the result above this looks like it is working. Do you want a few more examples?

hdrabkin commented 4 years ago

Our ISO load makes MGI the assignee since WE assign the ISO evidence code. This holds for human and rat for GO_REF:0000096 (human AND rat; same GO_ref but diff J#) Note: in our EI, the created_by/modified_by is identified by the db originating the manual experimental annotation (either UniProtKB (GOA) or RGD

hdrabkin commented 4 years ago

The output for GAF becomes MGI for the assigned_by I thought this also happens for the GPAD also; I will need to check.

ukemi commented 4 years ago

So then I think the strategy using the refs is the correct strategy.

ukemi commented 4 years ago

It does happen in the GPAD. That is the file we are using. No need to check.

hdrabkin commented 4 years ago

ok; great; I was trying to find the TR to make sure I also said to do it in GPAD as well as GAF

dustine32 commented 4 years ago

@ukemi Dang, my server at USC restarted last night for some reason. You should be able to get to the model now.

dustine32 commented 4 years ago

@ukemi If you wanna give me some more example genes I'll load em into my server. Better to be thorough!

ukemi commented 4 years ago

OK. Here are some more examples: MGI:108449 Only the bold ISOs should get loaded.

MGI MGI:108449 enables GO:0004888 MGI:MGI:2152098|GO_REF:0000002 ECO:0000501 InterPro:IPR017981 20191110 UniProt
MGI MGI:108449 enables GO:0004930 MGI:MGI:1277003|PMID:9729410 ECO:0000304 20011204 MGI
MGI MGI:108449 enables GO:0004999 MGI:MGI:1277003|PMID:9729410 ECO:0000304 20000119 MGI
MGI MGI:108449 enables GO:0005515 MGI:MGI:5306201|PMID:22084075 ECO:0000353 UniProtKB:Q6NSW3 20120903 MGI
MGI MGI:108449 part_of GO:0005737 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100119 MGI
MGI MGI:108449 part_of GO:0005768 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100119 MGI
MGI MGI:108449 part_of GO:0005791 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100120 MGI
MGI MGI:108449 part_of GO:0005886 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-1003 20191110 UniProt
MGI MGI:108449 part_of GO:0005901 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100119 MGI
MGI MGI:108449 part_of GO:0005923 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100120 MGI
MGI MGI:108449 acts_upstream_of_or_within GO:0007165 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0807 20191110 UniProt
MGI MGI:108449 involved_in GO:0007166 MGI:MGI:2152098|GO_REF:0000002 ECO:0000501 InterPro:IPR017981 20191110 UniProt
MGI MGI:108449 acts_upstream_of_or_within GO:0007186 MGI:MGI:1277003|PMID:9729410 ECO:0000304 20011204 MGI
MGI MGI:108449 acts_upstream_of_or_within GO:0007202 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20050217 MGI
MGI MGI:108449 acts_upstream_of_or_within GO:0007275 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0217 20191110 UniProt
MGI MGI:108449 acts_upstream_of_or_within GO:0007283 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0744 20191110 UniProt
MGI MGI:108449 enables GO:0008179 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100119 MGI
MGI MGI:108449 enables GO:0008528 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000244369 20171006 GO_Central
MGI MGI:108449 part_of GO:0009986 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100120 MGI
MGI MGI:108449 acts_upstream_of_or_within GO:0010524 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100120 MGI
MGI MGI:108449 part_of GO:0016020 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0472 20191110 UniProt
MGI MGI:108449 part_of GO:0016020 MGI:MGI:2152098|GO_REF:0000002 ECO:0000501 InterPro:IPR036445|InterPro:IPR001879|InterPro:IPR002285 20191110 UniProt
MGI MGI:108449 part_of GO:0016021 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0812 20191110 UniProt
MGI MGI:108449 part_of GO:0016021 MGI:MGI:2152098|GO_REF:0000002 ECO:0000501 InterPro:IPR017981|InterPro:IPR000832 20191110 UniProt
MGI MGI:108449 enables GO:0017046 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000244369 20171006 GO_Central
MGI MGI:108449 acts_upstream_of_or_within GO:0019933 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20050217 MGI
MGI MGI:108449 acts_upstream_of_or_within GO:0030154 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0221 20191110 UniProt
MGI MGI:108449 enables GO:0030306 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100120 MGI
MGI MGI:108449 enables GO:0042923 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100120 MGI
MGI MGI:108449 part_of GO:0043005 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000244628 20171006 GO_Central
MGI MGI:108449 part_of GO:0043005 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100119 MGI
MGI MGI:108449 part_of GO:0043231 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:P41586 20171201 MGI
MGI MGI:108449 part_of GO:0043235 MGI:MGI:5474145|PMID:23382219 ECO:0000266 UniProtKB:P41586 20131224 MGI
MGI MGI:108449 acts_upstream_of_or_within GO:0043950 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100120 MGI
MGI MGI:108449 acts_upstream_of_or_within GO:0051057 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20160709 MGI
MGI MGI:108449 acts_upstream_of_or_within GO:0060548 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100120 MGI
MGI MGI:108449 acts_upstream_of_or_within GO:0060732 MGI:MGI:4417868|GO_REF:0000096 ECO:0000266 RGD:2038 20100120 MGI

ukemi commented 4 years ago

MGI:2442833

MGI MGI:2442833 part_of GO:0000242 MGI:MGI:2154458|GO_REF:0000008 ECO:0000266 UniProtKB:Q3SYG4 20140911 MGI
MGI MGI:2442833 enables GO:0005515 MGI:MGI:5828020|PMID:27979967 ECO:0000353 UniProtKB:Q8BMD2 20180322 MGI
MGI MGI:2442833 enables GO:0005515 MGI:MGI:5296985|PMID:22072986 ECO:0000353 UniProtKB:Q9JHQ5 20130322 MGI
MGI MGI:2442833 part_of GO:0005737 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0963 20191110 UniProt
MGI MGI:2442833 part_of GO:0005856 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0206 20191110 UniProt
MGI MGI:2442833 part_of GO:0005886 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-1003 20191110 UniProt
MGI MGI:2442833 part_of GO:0005929 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000470005 20170228 GO_Central
MGI MGI:2442833 part_of GO:0005929 MGI:MGI:2154458|GO_REF:0000008 ECO:0000266 UniProtKB:Q3SYG4 20140911 MGI
MGI MGI:2442833 acts_upstream_of_or_within GO:0015031 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0653 20191110 UniProt
MGI MGI:2442833 part_of GO:0016020 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000470005 20170228 GO_Central
MGI MGI:2442833 part_of GO:0016020 MGI:MGI:5306530|PMID:22139371 ECO:0000314 20140911 MGI part_of(EMAPA:17972)
MGI MGI:2442833 acts_upstream_of_or_within GO:0030030 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0970 20191110 UniProt
MGI MGI:2442833 part_of GO:0034451 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q3SYG4 20150130 MGI
MGI MGI:2442833 part_of GO:0034464 MGI:MGI:5296985|PMID:22072986 ECO:0000314 20141024 MGI part_of(EMAPA:17972)
MGI MGI:2442833 part_of GO:0034464 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000470005 20170228 GO_Central
MGI MGI:2442833 part_of GO:0034464 MGI:MGI:5467205|PMID:22500027 ECO:0000314 20130322 MGI part_of(EMAPA:17972)
MGI MGI:2442833 part_of GO:0034464 MGI:MGI:2154458|GO_REF:0000008 ECO:0000266 UniProtKB:Q3SYG4 20111228 MGI
MGI MGI:2442833 part_of GO:0034464 MGI:MGI:5560383|PMID:24550735 ECO:0000314 20140404 UniProt
MGI MGI:2442833 part_of GO:0035869 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q3SYG4 20150130 MGI
MGI MGI:2442833 part_of GO:0036064 MGI:MGI:5469957|PMID:22922713 ECO:0000266 UniProtKB:O01514 20130711 MGI
MGI MGI:2442833 part_of GO:0042995 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0966 20191110 UniProt
MGI MGI:2442833 acts_upstream_of_or_within GO:0060271 MGI:MGI:5526920|PMID:22479622 ECO:0000315 20131114 FlyBase
MGI MGI:2442833 acts_upstream_of_or_within GO:0060271 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000470005 20190322 GO_Central
MGI MGI:2442833 acts_upstream_of_or_within GO:0061512 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q3SYG4 20150903 MGI

ukemi commented 4 years ago

MGI:1923036

MGI MGI:1923036 part_of GO:0000775 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0137 20191110 UniProt
MGI MGI:1923036 part_of GO:0000776 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20170228 GO_Central
MGI MGI:1923036 part_of GO:0000776 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20160721 MGI
MGI MGI:1923036 part_of GO:0000922 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20170228 GO_Central
MGI MGI:1923036 part_of GO:0000922 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20120110 MGI
MGI MGI:1923036 colocalizes_with GO:0000930 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20060811 MGI
MGI MGI:1923036 part_of GO:0005694 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0158 20191110 UniProt
MGI MGI:1923036 part_of GO:0005730 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20190905 MGI
MGI MGI:1923036 part_of GO:0005737 MGI:MGI:2154458|GO_REF:0000008 ECO:0000266 EMBL:X92474 20060111 MGI
MGI MGI:1923036 part_of GO:0005813 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20170228 GO_Central
MGI MGI:1923036 part_of GO:0005813 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20190905 MGI
MGI MGI:1923036 part_of GO:0005856 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0206 20191110 UniProt
MGI MGI:1923036 part_of GO:0005886 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20190905 MGI
MGI MGI:1923036 acts_upstream_of_or_within GO:0007049 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0131 20191110 UniProt
MGI MGI:1923036 acts_upstream_of_or_within GO:0007051 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20100127 MGI
MGI MGI:1923036 acts_upstream_of_or_within GO:0007052 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20190819 GO_Central
MGI MGI:1923036 acts_upstream_of_or_within GO:0007098 MGI:MGI:2154458|GO_REF:0000008 ECO:0000266 EMBL:X92474 20100210 MGI
MGI MGI:1923036 acts_upstream_of_or_within GO:0007098 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20060811 MGI
MGI MGI:1923036 enables GO:0008017 MGI:MGI:2154458|GO_REF:0000008 ECO:0000266 EMBL:X92474 20060111 MGI
MGI MGI:1923036 enables GO:0008017 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20190819 GO_Central
MGI MGI:1923036 acts_upstream_of_or_within GO:0030951 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20060811 MGI
MGI MGI:1923036 acts_upstream_of_or_within GO:0030951 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20170228 GO_Central
MGI MGI:1923036 part_of GO:0032991 MGI:MGI:3804325|PMID:18468998 ECO:0000266 UniProtKB:Q14008 20180219 MGI
MGI MGI:1923036 colocalizes_with GO:0035371 MGI:MGI:4834177|GO_REF:0000096 ECO:0000266 UniProtKB:Q14008 20120110 MGI
MGI MGI:1923036 colocalizes_with GO:0035371 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20170228 GO_Central
MGI MGI:1923036 enables GO:0043021 MGI:MGI:2154458|GO_REF:0000008 ECO:0000266 EMBL:X92474 20060111 MGI
MGI MGI:1923036 acts_upstream_of_or_within GO:0046785 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20170228 GO_Central
MGI MGI:1923036 acts_upstream_of_or_within GO:0051298 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20170228 GO_Central
MGI MGI:1923036 acts_upstream_of_or_within GO:0051301 MGI:MGI:1354194|GO_REF:0000004 ECO:0000501 UniProtKB-KW:KW-0132 20191110 UniProt
MGI MGI:1923036 enables GO:0061863 MGI:MGI:6201960|PMID:21873635 ECO:0000318 PANTHER:PTN000288762 20190819 GO_Central

hdrabkin commented 4 years ago

GO_REF:0000008 is our J:73065 manual ISO assertion

ukemi commented 4 years ago

PS. Just to be as clear as can be when I say only the bold ones should get through, I mean only the bold ISO annotations. The other manual annotations made by MGI should all get through. I noticed that in the RNA binding annotation in the example that you made above the input is a human gene. This practice needs to be stopped. I don't think the mouse gene product binds the human gene in vivo.

ukemi commented 4 years ago

Thanks @hdrabkin! I think we do want to import those (J:73065). We want to be able to edit them in Noctua if we need to change them, right? Bottom line is we want to import all the annotations that are made by MGI curators that would include these.

ukemi commented 4 years ago

Actually @dustine32 the issue with the annotation example above isn't a curation error. It's a bug in the conversion of the binding annotations. We should only have the 'with' field be converted to input for annotations with an IPI evidence code. In this case, the 'with' field of the ISO annotation indicates the human ortholog of the mouse gene. This should be retained in the 'with' field. Ping @vanaukenk

hdrabkin commented 4 years ago

The annotation is suppose to be that , for example, the gene product of Ckap5 has RNP binding activity based on a human gene product, EMBL:X92474 (ISO) MGI:1923036 And gene product of Ckap5 has microtublin binding activity based on EMBL:X92474 (ISO) We are not saying the mouse gene product binds the human gene product Or am I reading what you said incorrectly. I suppose EMBL:X92474 can be changed to the uniprot equivalent to human Ckap5, which had no id when annotation was made (2005)

PS. Just to be as clear as can be when I say only the bold ones should get through, I mean only the bold ISO annotations. The other manual annotations made by MGI should all get through. I noticed that in the RNA binding annotation in the example that you made above the input is a human gene. This practice needs to be stopped. I don't think the mouse gene product binds the human gene in vivo.

ukemi commented 4 years ago

@hdrabkin, you are correct. this is a result of the other ticket above.

hdrabkin commented 4 years ago

Since I'm looking at it in EI now, I am replacing the EMBL id (pointing to the mRNA that codes the protein) to the UniProt ID so is consistent with our use of ISO.

hdrabkin commented 4 years ago

So since I see it in front of me I'll replace the EMBL encoding the mRNA for the human protein to the UniProtKB id to be consistent with most of the J:73065s

dustine32 commented 4 years ago

@ukemi Makes total sense with the IPI-only with->has_input policy. This should be an easy thing to adjust.

ukemi commented 4 years ago

Because I worry, we should also check to see if there are ISO annotations that have a 'has input' extension. This would influence the round-trip strategy. We wouldn't want to convert those back to an IPI and an incorrect with field when we generate the GPAD. I hope there aren't any, but you never know.

dustine32 commented 4 years ago

@ukemi I'm guessing the answer is 'no' since I recall checking for translated MGI ISO's having extensions and realized that none of them do:

$ grep ECO:0000266 mgi.gpa | cut -f10,11 | sort | uniq -c
  90 GOC
124130 MGI
   1 MGI        has_input(PR:000049786),has_output(PR:000049785),occurs_in(CL:0000287),occurs_in(EMAPA:17168),occurs_in(GO:0005794),part_of(GO:0018345),part_of(GO:1903546)
   1 MGI        has_participant(PR:000049786),transports_or_maintains_localization_of(PR:000049785)
   1 MGI        negatively_regulates(GO:0001649)
   1 MGI        negatively_regulates(GO:0043932)
   1 MGI        occurs_in(CL:0000235),occurs_in(EMAPA:16728),occurs_in(EMAPA:18334),part_of(GO:0048246)
   1 MGI        occurs_in(CL:0000704),occurs_in(GO:0009897),part_of(GO:0002042)
   1 MGI        occurs_in(EMAPA:32874),results_in_movement_of(CL:0002607)
   1 MGI        part_of(CL:0000287),part_of(EMAPA:17168)
   1 MGI        part_of(CL:0000704)
   1 MGI        part_of(GO:0006422)
   2 MGI        part_of(GO:0007204),part_of(GO:0048246)
   1 MGI        part_of(GO:0007204),part_of(GO:0070098)
   1 MGI        positively_regulates(GO:0045453)
   1 MGI        regulates(GO:0046849)

So some do but these are exported from noctua (as you mentioned). Excluding noctua lines:

$ grep ECO:0000266 mgi.gpa | grep -v noctua-model-id | cut -f10,11 | sort | uniq -c
  90 GOC
124116 MGI

All remaining ISO lines have empty extensions columns.

ukemi commented 4 years ago

One tick off my worry list. Thanks @dustine32

dustine32 commented 4 years ago

@ukemi Here are the models for: MGI:MGI:108449 MGI:MGI:2442833 MGI:MGI:1923036

And the reloaded model for MGI:MGI:1920081 that should have the with column correctly parsed into the with field (no has_input).

ukemi commented 4 years ago

MGI:108449- perfect MGI:2442833- looks good, but why do the BBsome annotations get a part_of relationship with the CC and the other get a located_in? I originally thought it was for protein-containing complexes, but the next model indicates that is not true. MGI:1923036- looks good with the question about the located_in protein containing complex for this model. MGI:1920081- perfect

dustine32 commented 4 years ago

@ukemi Oh wow, thanks for finding this bug! My code for finding the ProteinContainingComplex shape from a term wasn't working if the term was protein-containing complex root GO:0032991.

I just fixed the code and pushed new models for MGI:2442833 and MGI:1923036, though only the MGI:1923036 model should've changed (i.e. protein-containing complex predicate changed from located_in to part_of).

ukemi commented 4 years ago

We can now elevate those two models to perfect.

pgaudet commented 2 months ago

I think this is addressed differently now that the mouse isoform pipeline is managed at GOC.