d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Data mapped to invalid/outdated ENSGs #265

Closed zdorman closed 2 years ago

zdorman commented 2 years ago

What data file(s) does this issue pertain to?

JSONL files for MTP: gene-level-snv-consensus-annotated-mut-freq.jsonl variant-level-snv-consensus-annotated-mut-freq.jsonl gene-level-cnv-consensus-annotated-mut-freq.jsonl putative-oncogene-fused-gene-freq.jsonl putative-oncogene-fusion-freq.jsonl long_n_tpm_mean_sd_quantile_gene_wise_zscore.jsonl long_n_tpm_mean_sd_quantile_group_wise_zscore.jsonl

What release are you using?

v10

Put your question or report your issue here.

258 Ensembl IDs (ENSGs) within the Somatic Alterations and Gene Expression data files are not found within the Open Targets target database (and therefore not in the MTP target database). The problem seems to be due to some of the CHoP gene symbols being mapped to older Gencode versions no longer in use by Ensembl/Open Targets.

Within MTP, the target and evidence pages for these Ensembl IDs do not exist, and the data associated with these genes cannot be accessed (even when using direct url navigation).

The current build of MTP is based on the 21.06 build of Open Targets, which uses Ensembl package 104 and Gencode 38. My understanding of the gene_match analysis is that it merges gencode v28 and gencode v38 to find ENSGs for all gene symbols in the source datasets. My suspicion is that this method is also pulling in ENSGs that are valid in GRCh37.p13 but that have been deprecated in GRCh38.p13.

Example gene with invalid ENSG: Gene Symbol: KIAA1107 Ensembl ID: ENSG00000069712 GRCh37.p13: http://grch37.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000069712;r=1:92632542-92650280 GRCh38.p13: https://useast.ensembl.org/Homo_sapiens/Gene/Idhistory?g=ENSG00000069712

The full list of problematic ENSGs and a notebook export of analysis has been upload to NIH Box

Do you have any ideas for ways of catching and updating the invalid/outdated ENSGs for v11?

jharenza commented 2 years ago

closed with #162