d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Fusion data missing ENSGs #264

Closed zdorman closed 2 years ago

zdorman commented 2 years ago

What data file(s) does this issue pertain to?

putative-oncogene-fused-gene-freq.jsonl putative-oncogene-fusion-freq.jsonl

What release are you using?

v10

Put your question or report your issue here.

There are 17 records across 4 Gene_symbols within the v10 Fusion JSONL files that do not have Ensembl Gene ID (targetFromSourceId) values.

FusionName Gene_symbol Gene_Position targetFromSourceId Disease Dataset
IGH-@--MAGED1 IGH-@ Gene1A Neuroblastoma All Cohorts
IGH-@--MAGED1 IGH-@ Gene1A Neuroblastoma TARGET
EPOR--IGH-@ IGH-@ Gene1B Acute Lymphoblastic Leukemia TARGET
IGH@--PNPLA7 IGH@ Gene1A Teratoma PBTA
IGH@--CLU IGH@ Gene1A Teratoma PBTA
IGH@--PTBP1 IGH@ Gene1A Osteosarcoma TARGET
IGH@--RMRP IGH@ Gene1A Teratoma PBTA
B2M--IGH@ IGH@ Gene1B Acute Lymphoblastic Leukemia TARGET
IGH@--PTMA IGH@ Gene1A Acute Myeloid Leukemia TARGET
DDX5--IGH@ IGH@ Gene1B Acute Lymphoblastic Leukemia TARGET
MOB3A--IGH@ IGH@ Gene1B Acute Lymphoblastic Leukemia TARGET
GNA15--IGH@ IGH@ Gene1B Acute Lymphoblastic Leukemia TARGET
DDX39A--IGH@ IGH@ Gene1B Acute Lymphoblastic Leukemia TARGET
BCL2L13--IGL-@ IGL-@ Gene1B High-grade glioma/astrocytoma PBTA
RBFOX2--IGL-@ IGL-@ Gene1B CNS Embryonal tumor PBTA
BCL2L13--IGL@ IGL@ Gene1B High-grade glioma/astrocytoma PBTA
RBFOX2--IGL@ IGL@ Gene1B CNS Embryonal tumor PBTA

These seem like edge cases. Are there relevant ENSGs that can be mapped to these Gene_symbols for v11? If not, these particular records will remain inaccessible within MTP.

It should be noted that even if IDs are not assigned for the problematic genes, the fusion frequencies could still be accessed by a user when viewing data for the non-problematic fused gene. For instance, fusion IGH-@--MAGED1 appears in the fusion data with problematic Gene_symbol = IGH-@, and also again with useable Gene_symbol = MAGED1 . So a user can access the frequency data for IGH-@--MAGED1 fusion when viewing from MAGED1 target or evidence pages.

logstar commented 2 years ago

This issue seems to be related to https://github.com/PediatricOpenTargets/ticket-tracker/issues/153, which was originally discussed in the fusion frequency table module PR at https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/49#discussion_r680118271. The related issue and PR comment have some relevant results and comments, which might be helpful to resolve this issue.

jharenza commented 2 years ago

closed with #162