fpbarthel / GLASS

GLASS consortium
MIT License
37 stars 13 forks source link

Report Sequenza work-flow #136

Closed EmreKocakavuk closed 5 years ago

EmreKocakavuk commented 5 years ago

General information about Sequenza

Sequenza is a software package that uses paired tumor-normal DNA-seq data (WXS/WGS) for estimation of tumor cellularity (purity) and ploidy. Furthermore, it infers allele-specific copy number profiles and mutation profiles.

Briefly, Sequenza is based on a probabilistic model applied to segmented data. Estimation model is based on a maximum a posteriori approach (SLPP = scaled rank log posterior probability). The output includes average depth ratio and BAF (B allele frequency), overall tumor ploidy and cellularity, segment-specific copy number and minor allele copy number.

Original Paper: Favero et al., Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data, Annals of Oncology 2015 https://academic.oup.com/annonc/article/26/1/64/2802634

Work-flow

Broadly, the work-flow consists of 2 major parts - Sequenza-utils/python & Sequenza/R.

Distribution of ploidy-values for the GLASS-dataset: image

Distribution of cellularity-values for the GLASS-dataset: image

-> For some aliquots alternative solutions of output files exist. The default/best solution is always the file with the highest probability (SLPP score). However, we observe unexpected high ploidy-values for some samples when compared with TITAN. Additionally, seqz-purity is in general lower than TITAN-purity.

__ In an attempt to address the issue of unexpected high Sequenza ploidy-values, we applied the following correction methods:

"GLSS-AT-00P6-R1-01D-WXS-JQX1ST", "GLSS-AT-GP01-TP-03D-WXS-VHOP3P", "GLSS-CU-R014-R1-01D-WXS-C9ODK4", "GLSS-DK-0006-TP-01D-WXS-47D949", "GLSS-MG-0008-R1-01D-WXS-KRZ231", "GLSS-SF-0011-TP-01D-WXS-YIYP4N", "GLSS-SF-0012-R1-01D-WXS-M12I6U", "GLSS-SM-R064-TP-01D-WXS-16OKWU", "GLSS-SU-0005-R2-01D-WXS-M1MPPT", "TCGA-14-1034-TP-01D-WXS-B0AXT9", "TCGA-14-1034-R1-01D-WGS-L9V6H0"

-> For these samples the mean ploidy value shifted from 4.67 to 2.27

"GLSS-SM-R064-R1-01D-WXS-GFA2BL"

In summary, for n = 12 aliquots an alternative solution was selected. __ Final comparisons between Sequenza and TITAN after above mentioned correction:

Comparison of TITAN-ploidy (x-axis) vs. SEQZ-ploidy (y-axis): Pearson correlation, cor =
0.10 image

Comparison of TITAN-purity (x-axis) vs. SEQZ-purity (y-axis): Pearson correlation, cor = 0.32 image

Kcjohnson commented 5 years ago

Here are some figures as well as tabular results showing the general relationship between estimates of purity and mutational frequency. Overall, the strongest association was between Sequenza purity and mutational frequency in recurrences with larger sample sizes. The TITAN results do not seem to be consistently associated with subtype or time point.

Sequenza

sequenza-purity-mut-frequency

Subtype Time point Correlation P-value
IDHmut-codel Primary 0.28 0.11
IDHmut-noncodel Primary 0.001 0.95
IDHwt Primary 0.18 0.01
IDHmut-codel Recurrences -0.02 0.92
IDHmut-noncodel Recurrences 0.42 5.8E-06
IDHwt Recurrences 0.30 6.6E-06

TITAN

titan-purity-mut-frequency

Subtype Time point Correlation P-value
IDHmut-codel Primary 0.21 0.25
IDHmut-noncodel Primary -0.06 0.55
IDHwt Primary -0.08 0.28
IDHmut-codel Recurrences 0.02 0.87
IDHmut-noncodel Recurrences 0.19 0.04
IDHwt Recurrences -0.15 0.02
fpbarthel commented 5 years ago

Thanks for these great analyses guys. I am closing this issue for now but will keep a record of it for later reference.