genome-nexus / genome-nexus-importer

Import data into MongoDB for use by https://github.com/genome-nexus/genome-nexus/
MIT License
4 stars 16 forks source link

Transcript analysis (Grch37/38) - Log #31

Open zhx828 opened 4 years ago

zhx828 commented 4 years ago

biomart mapping file, genes do no have entrez

genes not in cBioPortal

hugo symbol does not match with cBioPortal

Problem(mismatch) transcripts

problem_transcripts.txt

gene protein length check (ones without protein length do not have pfam, vice versa)

OncoKB issues

Good thing is, for both 37/38, they are using the same transcript. But there are still two issues

inodb commented 4 years ago

This is great! Thank you so much!

inodb commented 4 years ago

Thanks again @zhx828 !

A few questions

zhx828 commented 4 years ago
  • What do you mean by: ones without protein length do not have pfam, vice versa

if you look at the *_info.txt files, the genes that do not have protein length, they do not have pfam data either. So we just need to look at one factor.

  • Do we currently have grch38 loaded in the main cBioPortal? I think there is a seed database but im not sure if it's actually loaded. What does it mean exactly when a grch38 gene is not in cBioPortal?

I just pulled the genes from the portal and run through both versions to see whether these entrez genes are in cbioportal. Didn't compare between 37 and 38 though. They might be identical. I think this is mainly for Ramya to finalize the portal gene table.

  • What are problem mismatch transcripts? They mismatch between grch37 and grch38? Is that based on just the first part of the id? E.g. ENSTx.y so x is the same between grch37 and grch38 or does it include the y version part. Note that the protein length might change when .y changes between grch37 and grch38 (in most cases it does not but occasionally it does)

I didn't not check y. Only the x to see whether they are the same.

@inodb

inodb commented 4 years ago

Thanks so much @zhx828 !

I didn't not check y. Only the x to see whether they are the same.

I see so there are a few corner cases where even though the id matches the length might not be the same. Yeah it's weird 🙂. So it's good to check if the length matches as well. For starters at least for OncoKB annotated genes

zhx828 commented 4 years ago

Thanks so much @zhx828 !

I didn't not check y. Only the x to see whether they are the same.

I see so there are a few corner cases where even though the id matches the length might not be the same. Yeah it's weird 🙂. So it's good to check if the length matches as well. For starters at least for OncoKB annotated genes

Cool, will do. Thanks!

zhx828 commented 4 years ago

This is related to https://github.com/genome-nexus/genome-nexus/issues/306