PHI-base / curation

PHI-base curation
0 stars 0 forks source link

Unused genes in curation sessions #114

Open jseager7 opened 1 year ago

jseager7 commented 1 year ago

The following table lists all the genes in curation sessions that aren't used in any annotations. The table was generated by a script that searches the PHI-Canto JSON export. It's up to date as of 1 December 2022.

The script also checks annotation extensions, so it should exclude genes that are used in an annotation extension directly or indirectly (e.g. contained in a metagenotype that's used as a control). The genes in the table might still exist in an allele, genotype or metagenotype, but the gene should still appear here if any of the things that contain it haven't been annotated either.

I excluded genes that are used in a plain text extension (when an admin user uses 'Edit as text'), and these probably need to be replaced with the actual gene in the curation session. There's a table at the bottom of this issue listing those cases. The second table might not be comprehensive, as I was only doing this check to avoid false positives for unused genes.

Approved sessions

Publication Species UniProtKB Gene name AC Comment
PMID:29567712 Magnaporthe oryzae G4N374 MPS1 Deleted MPS1 from session
PMID:30044782 Magnaporthe oryzae Q51WZ9 ATG9 Complicated! MoAtg9 transformed into FgAtg9 delta. See comments in session. UNRESOLVED.
PMID:33510733 Mus musculus P43431 Il12a Il12a and Il12b were added as text in Admin mode. AE assayed_using BOTH Il12a and Il12b
PMID:33510733 Mus musculus P43432 Il12b Il12a and Il12b were added as text in Admin mode. AE assayed_using BOTH Il12a and Il12b
PMID:33510733 Mus musculus P10148 Ccl2 Deleted from session

Unapproved sessions

Publication Species UniProtKB Gene name AC Comment
PMID:10537193 Fusarium graminearum Q00909 TRI5 Resolved - removed gene. This session had not been approved??
PMID:20624958 Parastagonospora nodorum A9JX75 Tox1 Curation in progress
PMID:26163574 Phytophthora sojae E0W4N2 Avh Curation in progress
PMID:26163574 Phytophthora sojae E0W4Q5 Avh
PMID:26163574 Phytophthora sojae E0W504 Avh
PMID:26163574 Phytophthora sojae E0W523 Avh
PMID:26163574 Phytophthora sojae E0W524 Avh
PMID:26163574 Phytophthora sojae E0W537 Avh
PMID:26163574 Phytophthora sojae E0W545 Avh
PMID:26163574 Phytophthora sojae E0W560 Avh
PMID:26163574 Phytophthora sojae E0W563 Avh
PMID:26163574 Phytophthora sojae E0W566 Avh
PMID:26163574 Phytophthora sojae E0W574 Avh
PMID:26163574 Phytophthora sojae E0W588 Avh
PMID:26163574 Phytophthora sojae E0W5G0 Avh
PMID:26163574 Phytophthora sojae E0W5G3 Avh
PMID:26163574 Phytophthora sojae E0W5I9 Avh
PMID:26163574 Phytophthora sojae E0W5M1 Avh
PMID:26163574 Phytophthora sojae G1FSH9 Avh
PMID:26163574 Phytophthora sojae E0W544 Avh23
PMID:26163574 Phytophthora sojae G5A8M1 Avh240
PMID:26163574 Phytophthora sojae G4ZRQ7 Avh331
PMID:26163574 Phytophthora sojae D7PC71 Avh432
PMID:26163574 Phytophthora sojae E0W547 Avh5
PMID:26822079 Solanum tuberosum M1C5B8 102587690
PMID:26822079 Solanum tuberosum M1A8S7 102590757
PMID:26822079 Solanum tuberosum M1CZL3 102602508
PMID:30584105 Nicotiana benthamiana E3VXE7 Serk3B
PMID:31012804 Nicotiana tabacum A0A1S4AT15 LOC107800898
PMID:32487759 Aspergillus fumigatus B9UNL5 hacAi
PMID:34399627 Aspergillus fumigatus Q4WE17 AFUA_5G01440
PMID:34399627 Aspergillus fumigatus Q4WLS4 AFUA_6G12500
PMID:34399627 Aspergillus fumigatus Q4WV91 sreA
PMID:35468894 Arabidopsis lyrata subsp. lyrata D7KYE1 ARALYDRAFT_894894
PMID:35468894 Sclerotinia sclerotiorum Q7Z8Q7 PG3

Genes used as plain text extensions

Publication Species UniProtKB Gene name AC Comment
PMID:28720735 Homo sapiens P01584 IL1B Removed and readded AE assayed_using IL1B
PMID:32323095 Homo sapiens P05231 IL6 Removed and readded AE assayed_using IL6, but unable to save/finish as seemed to already be correct
PMID:30610168 Oryza sativa Q336X9 MPK6 needs AE assayed_using for PHIPO term https://github.com/PHI-base/config/issues/63
PMID:30610168 Oryza sativa Q6Z437 MPK3 needs AE assayed_using for PHIPO term
jseager7 commented 1 year ago

If you notice any mistakes in either table, please let me know because it probably indicates a problem with the script that I didn't catch.

jseager7 commented 1 year ago

I'd expect all these genes to have been added to PHI-base 5 already. Fortunately, the website copes quite well: it shows a gene page without any annotation sections, but the other sections still seem correct. Shown below is an example for MPS1 of M. oryzae.

Ironically, removing these genes might cause more problems than it solves, since then we have to decide what to do about the problems raised in issue https://github.com/PHI-base/PHI5_web_display/issues/75.

image

CuzickA commented 1 year ago

Thanks @jseager7, I'll look into the above.

CuzickA commented 1 year ago

Hi @jseager7, The first table of data looks like it has not come from the 'approved' sessions.

CuzickA commented 1 year ago

PMID:32323095 | Homo sapiens | P05231 | IL6 PMID:30610168 | Oryza sativa | Q336X9 | MPK6 PMID:30610168 | Oryza sativa | Q6Z437 | MPK3

Note to self - still to check these entries.

Now checked 05_11_22, some issues noted in https://github.com/PHI-base/curation/issues/42

CuzickA commented 1 year ago

Looks like we need a checking mechanism to make sure AE 'added as text' get updated and not just lost in the annotations.

jseager7 commented 1 year ago

The first table of data looks like it has not come from the 'approved' sessions.

You're right, not sure why that happened. It's probably better that the table lists unused genes in all sessions though. I've updated the original comment.

CuzickA commented 1 year ago

The first table of data looks like it has not come from the 'approved' sessions.

You're right, not sure why that happened. It's probably better that the table lists unused genes in all sessions though. I've updated the original comment.

Is it possible to sort them into approved and unapproved please?

jseager7 commented 1 year ago

Looks like we need a checking mechanism to make sure AE 'added as text' get updated and not just lost in the annotations.

I think this is fairly simple to do in the JSON export, because the annotation extensions that have been added as text have no rangeType property. Here's an example of annotations added as text:

"extension" : [
    {
        "rangeDisplayName" : "MPK3",
        "rangeValue" : "Q6Z437",
        "relation" : "assayed_using"
    },
    {
        "rangeDisplayName" : "MPK6",
        "rangeValue" : "Q336X9",
        "relation" : "assayed_using"
    }
]

Compared to those that have been added correctly, which have a rangeType of Gene:

"extension" : [
    {
        "rangeDisplayName" : "EIL1A",
        "rangeType" : "Gene",
        "rangeValue" : "Q10M41",
        "relation" : "assayed_using"
    }
]

So I can just search for all annotation extensions with no rangeType.

jseager7 commented 1 year ago

Is it possible to sort them into approved and unapproved please?

Yes, but it will take a bit of work. I'll amend the original comment once I'm done.

CuzickA commented 1 year ago

Sounds good thanks

jseager7 commented 1 year ago

Is it possible to sort them into approved and unapproved please?

This is done now.

CuzickA commented 3 weeks ago

Hi @jseager7, shall we close this ticket now?

jseager7 commented 3 weeks ago

@CuzickA The underlying problem hasn't been fixed as far as I know, so it's best to leave this issue open. Feel free to un-assign yourself.