Report Sequenza work-flow

EmreKocakavuk commented 5 years ago

General information about Sequenza

Sequenza is a software package that uses paired tumor-normal DNA-seq data (WXS/WGS) for estimation of tumor cellularity (purity) and ploidy. Furthermore, it infers allele-specific copy number profiles and mutation profiles.

Briefly, Sequenza is based on a probabilistic model applied to segmented data. Estimation model is based on a maximum a posteriori approach (SLPP = scaled rank log posterior probability). The output includes average depth ratio and BAF (B allele frequency), overall tumor ploidy and cellularity, segment-specific copy number and minor allele copy number.

Original Paper: Favero et al., Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data, Annals of Oncology 2015 https://academic.oup.com/annonc/article/26/1/64/2802634

Work-flow

Broadly, the work-flow consists of 2 major parts - Sequenza-utils/python & Sequenza/R.

Inputs: BAM files (tumor and normal), FASTA reference file (human_g1k_v37), GC-content file (used 1 kb bin size)
Outputs (that are of importance for GLASS): Tumor cellularity, ploidy, segments

Distribution of ploidy-values for the GLASS-dataset:

Distribution of cellularity-values for the GLASS-dataset:

-> For some aliquots alternative solutions of output files exist. The default/best solution is always the file with the highest probability (SLPP score). However, we observe unexpected high ploidy-values for some samples when compared with TITAN. Additionally, seqz-purity is in general lower than TITAN-purity.

__ In an attempt to address the issue of unexpected high Sequenza ploidy-values, we applied the following correction methods:

Selected aliquots with alternative solutions that have comparably close SLPP-values (within a range of 95%) to the first, i.e. default/best solution (n = 55 aliquots)
Manually, compared ploidy-values with GATK plots and selected alternative solution if it was more likely to be true, i.e. closer to ploidy = 2 (n = 11 aliquots):

"GLSS-AT-00P6-R1-01D-WXS-JQX1ST", "GLSS-AT-GP01-TP-03D-WXS-VHOP3P", "GLSS-CU-R014-R1-01D-WXS-C9ODK4", "GLSS-DK-0006-TP-01D-WXS-47D949", "GLSS-MG-0008-R1-01D-WXS-KRZ231", "GLSS-SF-0011-TP-01D-WXS-YIYP4N", "GLSS-SF-0012-R1-01D-WXS-M12I6U", "GLSS-SM-R064-TP-01D-WXS-16OKWU", "GLSS-SU-0005-R2-01D-WXS-M1MPPT", "TCGA-14-1034-TP-01D-WXS-B0AXT9", "TCGA-14-1034-R1-01D-WGS-L9V6H0"

-> For these samples the mean ploidy value shifted from 4.67 to 2.27

For one aliquot (R1) the 4th solution (ploidy = 1.9) was selected, because the default solution gave a ploidy-value of 5.5, whereas TP ploidy and R2 ploidy were 2 and 2.1, respectively (n = 1 aliquot):

"GLSS-SM-R064-R1-01D-WXS-GFA2BL"

In summary, for n = 12 aliquots an alternative solution was selected. __ Final comparisons between Sequenza and TITAN after above mentioned correction:

Comparison of TITAN-ploidy (x-axis) vs. SEQZ-ploidy (y-axis): Pearson correlation, cor =
0.10

Comparison of TITAN-purity (x-axis) vs. SEQZ-purity (y-axis): Pearson correlation, cor = 0.32

[x] Could you comment on your correlation studies between seqz/TITAN purity and mutation frequency? @Kcjohnson

Kcjohnson commented 5 years ago

Here are some figures as well as tabular results showing the general relationship between estimates of purity and mutational frequency. Overall, the strongest association was between Sequenza purity and mutational frequency in recurrences with larger sample sizes. The TITAN results do not seem to be consistently associated with subtype or time point.

Sequenza

sequenza-purity-mut-frequency

Subtype	Time point	Correlation	P-value
IDHmut-codel	Primary	0.28	0.11
IDHmut-noncodel	Primary	0.001	0.95
IDHwt	Primary	0.18	0.01
IDHmut-codel	Recurrences	-0.02	0.92
IDHmut-noncodel	Recurrences	0.42	5.8E-06
IDHwt	Recurrences	0.30	6.6E-06

TITAN

titan-purity-mut-frequency

Subtype	Time point	Correlation	P-value
IDHmut-codel	Primary	0.21	0.25
IDHmut-noncodel	Primary	-0.06	0.55
IDHwt	Primary	-0.08	0.28
IDHmut-codel	Recurrences	0.02	0.87
IDHmut-noncodel	Recurrences	0.19	0.04
IDHwt	Recurrences	-0.15	0.02

fpbarthel commented 5 years ago

Thanks for these great analyses guys. I am closing this issue for now but will keep a record of it for later reference.

fpbarthel / GLASS

Report Sequenza work-flow #136

Sequenza

TITAN