JSONL files for MTP:
gene-level-snv-consensus-annotated-mut-freq.jsonlvariant-level-snv-consensus-annotated-mut-freq.jsonlgene-level-cnv-consensus-annotated-mut-freq.jsonlputative-oncogene-fused-gene-freq.jsonlputative-oncogene-fusion-freq.jsonllong_n_tpm_mean_sd_quantile_gene_wise_zscore.jsonllong_n_tpm_mean_sd_quantile_group_wise_zscore.jsonl
What release are you using?
v10
Put your question or report your issue here.
258 Ensembl IDs (ENSGs) within the Somatic Alterations and Gene Expression data files are not found within the Open Targets target database (and therefore not in the MTP target database). The problem seems to be due to some of the CHoP gene symbols being mapped to older Gencode versions no longer in use by Ensembl/Open Targets.
Within MTP, the target and evidence pages for these Ensembl IDs do not exist, and the data associated with these genes cannot be accessed (even when using direct url navigation).
The current build of MTP is based on the 21.06 build of Open Targets, which uses Ensembl package 104 and Gencode 38. My understanding of the gene_match analysis is that it merges gencode v28 and gencode v38 to find ENSGs for all gene symbols in the source datasets. My suspicion is that this method is also pulling in ENSGs that are valid in GRCh37.p13 but that have been deprecated in GRCh38.p13.
What data file(s) does this issue pertain to?
JSONL files for MTP:
gene-level-snv-consensus-annotated-mut-freq.jsonl
variant-level-snv-consensus-annotated-mut-freq.jsonl
gene-level-cnv-consensus-annotated-mut-freq.jsonl
putative-oncogene-fused-gene-freq.jsonl
putative-oncogene-fusion-freq.jsonl
long_n_tpm_mean_sd_quantile_gene_wise_zscore.jsonl
long_n_tpm_mean_sd_quantile_group_wise_zscore.jsonl
What release are you using?
v10
Put your question or report your issue here.
258 Ensembl IDs (ENSGs) within the Somatic Alterations and Gene Expression data files are not found within the Open Targets target database (and therefore not in the MTP target database). The problem seems to be due to some of the CHoP gene symbols being mapped to older Gencode versions no longer in use by Ensembl/Open Targets.
Within MTP, the target and evidence pages for these Ensembl IDs do not exist, and the data associated with these genes cannot be accessed (even when using direct url navigation).
The current build of MTP is based on the 21.06 build of Open Targets, which uses Ensembl package 104 and Gencode 38. My understanding of the gene_match analysis is that it merges gencode v28 and gencode v38 to find ENSGs for all gene symbols in the source datasets. My suspicion is that this method is also pulling in ENSGs that are valid in GRCh37.p13 but that have been deprecated in GRCh38.p13.
Example gene with invalid ENSG: Gene Symbol: KIAA1107 Ensembl ID: ENSG00000069712 GRCh37.p13: http://grch37.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000069712;r=1:92632542-92650280 GRCh38.p13: https://useast.ensembl.org/Homo_sapiens/Gene/Idhistory?g=ENSG00000069712
The full list of problematic ENSGs and a notebook export of analysis has been upload to NIH Box
Do you have any ideas for ways of catching and updating the invalid/outdated ENSGs for v11?