Low number of significant mutations

srithegreat commented 11 months ago

I have a WGS cohort with 36 patients. When I run mutsig2cv with hg38 genome, I get only 5 genes with q<0.05, rest all are q-value of 1. Is it because of sample size? Does mutsig2cv need a minimum number of samples for it to give reliable numbers or there are some parameters that could be trained to get better results

Any help appreciated.

julianhess commented 11 months ago

Yes, it is likely due to the size of your cohort. The "minimum cohort size" depends completely on the population frequency of drivers you want to discover, which is a function of both cohort size and background mutation rate. You need more patients to discover rare drivers, and fewer patients to identify common drivers.

See here for a discovery power calculator. For example, a 36 patient cohort with a background mutation rate of 1/Mb has power to discover 82% of genes mutated in 20% of patients, 28% power for genes in 10% of patients, 4% power for genes in 5% of patients, and essentially no power at all to discover genes mutated in ≤3% of patients.

Given this, 5 significant genes is a completely reasonable number for a cohort of 36 patients.

MutSig's statistical model is designed to be as robust as possible, and thus does not have any parameter to adjust.

srithegreat commented 11 months ago

Thanks for the quick response. That explains the results. What would be your suggestion/approach to do in such cases, use the results as such or use some alternative strategy. By the way my cohort is of cervical cancer.

getzlab / MutSig2CV

Low number of significant mutations #26