geneontology / gopreprocess

MIT License
3 stars 1 forks source link

Ensure 1:1 correspondence of the UniProt identifier to an MGI marker when the GO term is a biological process #6

Closed sierra-moxon closed 9 months ago

sierra-moxon commented 1 year ago

First round of QC on the newly GOC generated human->mouse GAF file via orthology: https://drive.google.com/drive/folders/1uICd7pxqre6hwtKNnMV5NKTPfcgR9xGy

96K matches between new GOC file and MGI generated file straight away. Some were reported in the new GOC file that were rejected from MGI file because there is not a 1:1 correspondence of the UniProt identifier to an MGI marker and this is a process annotation.

For example:

89569   NON_1TO1_P      44596110        UniProtKB       P01911  HLA-DRB1        involved_in     GO:0032831      PMID:28467828   IDA             P       HLA class II histocompatibility antigen, DRB1 beta chain        HLA-DRB1        protein taxon:9606      20201030        UniProt

the UniProt identifier is associated with 2 markers in MGI, H2-Eb1 (MGI:95901) and H2-Eb2 (MGI:95902). Since paralogs in an organism often are involved in different processes, MGI conservatively doesn't make the association of any gene that maps to more than one mouse gene for biological process.

We need to add this logic to the code at GOC.

sierra-moxon commented 1 year ago

added this check via PR: https://github.com/geneontology/gopreprocess/pull/7

sierra-moxon commented 1 year ago

next iteration: https://drive.google.com/drive/folders/1D3r6M5iBH5QhQumuKj329CHn-4ttYTzL

ukemi commented 1 year ago

Hi @sierra-moxon

My recent reviews lead me to believe that you have done this?

kltm commented 9 months ago

@LiNiMGI I believe this is ready for the MGI side?

LiNiMGI commented 9 months ago

if @sierra-moxon has done this, this one can be closed.

sierra-moxon commented 9 months ago

thanks @LiNiMGI!