BIMIB-DISCo / TRONCO

Repository of the TRanslational ONCOlogy library, which includes various algorithms (such as CAPRESE and CAPRI) and the Pipeline for Cancer Inference (PICNIC).
https://bimib-disco.github.io/TRONCO
GNU General Public License v3.0
28 stars 7 forks source link

Misuse of `events.selection`? #129

Closed marcoxa closed 3 years ago

marcoxa commented 3 years ago

Hi giovini!

one of my students got this point having downloaded the COAD and the READ datasets using TCGAbiolinks.

> colo_data = events.selection(colo_data,  filter.freq = .05,
+                              filter.in.names = c("Wnt","RAS","PI3K","TGFb","P53"))
*** Events selection: #events =  19786 , #types =  1 Filters freq|in|out = { TRUE ,  TRUE ,  FALSE }
Minimum event frequency:  0.05  ( 20  alterations out of  399  samples).
...
Selected  2828  events.

Of course, 2828 "events" are too many. The student may have made some mistake; note that he passes in the pathways names (which as vectors of genes). The question is whether this is the most appropriate use of events.selection.

Any ideas?

MA

luca-dex commented 3 years ago

Can you attach an oncoprint of colo_data ?

marcoxa commented 3 years ago

Here it is

colo_data_oncoprint

marcoxa commented 3 years ago

Scusate.... ma perchè "closed"?

luca-dex commented 3 years ago

Sorry, probably it was an error. However from the picture it is not possible to see percentages. In the meantime I had to move to Karachi...

Event selection only count the amount of mutation / number of samples, so it is quite difficult that there can be an error.

Did the student checked the values manually, just to kmow if numbers are correct?

danro9685 commented 3 years ago

@marcoxa try just as a sanity check, to perform the two filtering steps separately (the 5% step and pathway names one). It is also very possible that in the data you have lots of variants with these characteristics (either >=5% or in these pathways).

@luca-dex please confirm, but I believe the filter is an "OR", meaning either of the 2 conditions, right?

danro9685 commented 3 years ago

@marcoxa I have double checked and can confirm that there is no error in the function, you simply selected too many genes.

Colorectal cancer has high mutational burden and here we have an average of approx 80 variants per patient. If you use this filters, these number are expected. Either increase the filtering threshold or give a list of pre-defined genes as names.

I will close this as it is not a bug.