cBioPortal / datahub

A centralized location for storing curated data from cBioPortal
168 stars 119 forks source link

TERT gene panel issues in GENIE #1314

Open tmazor opened 3 years ago

tmazor commented 3 years ago

User reported: https://groups.google.com/u/1/g/cbioportal/c/T-QlXI66WRU

Querying for TERT:DRIVER in GENIE can result in an alteration frequency >100% (see Desmoplastic Melanoma here: https://genie.cbioportal.org/results/cancerTypesSummary?Action=Submit&RPPA_SCORE_THRESHOLD=2.0&Z_SCORE_THRESHOLD=2.0&cancer_study_list=5fa03921e4b0242bd5d29486%2C5fa036d9e4b015b63e9c7076&case_set_id=all&data_priority=0&gene_list=TERT%253A%2520DRIVER&geneset_list=%20&profileFilter=0&tab_index=tab_visualize)

The reason is that TERT mutations are called in samples where TERT is not profiled: image

Can we figure out if the gene panel is wrong? Or is this is another example of calling real but off-target mutations?

ritikakundra commented 3 years ago

Working with Tom on this

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

AnikaBongaarts commented 2 months ago

This issue is still relevant today. See https://bit.ly/4cMqKg5 and Schermafbeelding 2024-07-05 171725 There are mutations found in TERT even though the samples were not identified as being profiled for mutations for TERT. The same happens for NFIB when looking at the structural variant. It would be important to have a check for this event occurring when loading the data. Additional a decision should be made if then TERT should be added to a panel yes or no. It could be that TERT is not on the gene panel at all, or, since it’s TERT, it’s possible that the TERT promoter is covered but not the rest of the gene etc. Additionally, when having multiple panels the oncoprint also can give wrong assumptions, because the “profiled in mutations” takes a cumulative of genes. For example when I select only TERT you will see 1 sample profiled for mutations whereas if I add KMT2C more samples are profiled for mutations since KMT2C is profiled on other panels. The user most then remembers TERT was only profiled in one sample. It would be more user friendly if this track can be shown also per panel and as such make this more clear to the user. Below some suggestions to these issues:

  1. Create a subtype of case lists for gene panels. Then we will define a case list per each panel and remove the data_gene_matrix file. This will allow us to assign samples to multiple panels.
  2. _sequenced, _cna and _sv case lists will indicate WES, therefore becoming optional when we have any case list of subtype gene panel created. This will fix the hack created now where you can define the WES panel in the data_gene_matrix file even though it doesn’t exist as a panel.
  3. The validator / loader will check alterations of mutations / CNA / SVs against the panels defined and will raise an error (not allowed to load) if any alteration is found in a gene not belonging to the panels defined. This will prevent errors on altered percentages going over 100.
  4. The “profiled in mutations / copy number / structural variants” tracks in the Oncoprint should be split to correspond each one to a specific panel (to avoid confusions on what is exactly profiled). We can keep the track “profiled in” as is but we should rethink the name so it is not confusing.
  5. Having the samples belonging to gene panels as case lists will allow us to filter by them (not possible now with the data_gene_matrix).