Open jseager7 opened 1 year ago
If you notice any mistakes in either table, please let me know because it probably indicates a problem with the script that I didn't catch.
I'd expect all these genes to have been added to PHI-base 5 already. Fortunately, the website copes quite well: it shows a gene page without any annotation sections, but the other sections still seem correct. Shown below is an example for MPS1 of M. oryzae.
Ironically, removing these genes might cause more problems than it solves, since then we have to decide what to do about the problems raised in issue https://github.com/PHI-base/PHI5_web_display/issues/75.
Thanks @jseager7, I'll look into the above.
Hi @jseager7, The first table of data looks like it has not come from the 'approved' sessions.
PMID:32323095 | Homo sapiens | P05231 | IL6 PMID:30610168 | Oryza sativa | Q336X9 | MPK6 PMID:30610168 | Oryza sativa | Q6Z437 | MPK3
Note to self - still to check these entries.
Now checked 05_11_22, some issues noted in https://github.com/PHI-base/curation/issues/42
Looks like we need a checking mechanism to make sure AE 'added as text' get updated and not just lost in the annotations.
The first table of data looks like it has not come from the 'approved' sessions.
You're right, not sure why that happened. It's probably better that the table lists unused genes in all sessions though. I've updated the original comment.
The first table of data looks like it has not come from the 'approved' sessions.
You're right, not sure why that happened. It's probably better that the table lists unused genes in all sessions though. I've updated the original comment.
Is it possible to sort them into approved and unapproved please?
Looks like we need a checking mechanism to make sure AE 'added as text' get updated and not just lost in the annotations.
I think this is fairly simple to do in the JSON export, because the annotation extensions that have been added as text have no rangeType
property. Here's an example of annotations added as text:
"extension" : [
{
"rangeDisplayName" : "MPK3",
"rangeValue" : "Q6Z437",
"relation" : "assayed_using"
},
{
"rangeDisplayName" : "MPK6",
"rangeValue" : "Q336X9",
"relation" : "assayed_using"
}
]
Compared to those that have been added correctly, which have a rangeType
of Gene
:
"extension" : [
{
"rangeDisplayName" : "EIL1A",
"rangeType" : "Gene",
"rangeValue" : "Q10M41",
"relation" : "assayed_using"
}
]
So I can just search for all annotation extensions with no rangeType
.
Is it possible to sort them into approved and unapproved please?
Yes, but it will take a bit of work. I'll amend the original comment once I'm done.
Sounds good thanks
Is it possible to sort them into approved and unapproved please?
This is done now.
Hi @jseager7, shall we close this ticket now?
@CuzickA The underlying problem hasn't been fixed as far as I know, so it's best to leave this issue open. Feel free to un-assign yourself.
The following table lists all the genes in curation sessions that aren't used in any annotations. The table was generated by a script that searches the PHI-Canto JSON export. It's up to date as of 1 December 2022.
The script also checks annotation extensions, so it should exclude genes that are used in an annotation extension directly or indirectly (e.g. contained in a metagenotype that's used as a control). The genes in the table might still exist in an allele, genotype or metagenotype, but the gene should still appear here if any of the things that contain it haven't been annotated either.
I excluded genes that are used in a plain text extension (when an admin user uses 'Edit as text'), and these probably need to be replaced with the actual gene in the curation session. There's a table at the bottom of this issue listing those cases. The second table might not be comprehensive, as I was only doing this check to avoid false positives for unused genes.
Approved sessions
Unapproved sessions
Genes used as plain text extensions