ccagc / QDNAseq

QDNAseq package for Bioconductor
47 stars 27 forks source link

Expected & measured SD #35

Closed jiading closed 8 years ago

jiading commented 8 years ago

I have questions regarding to expected standard deviation (E σ) and measured standard deviation.

screen shot 2016-06-02 at 12 48 51

In this result plot, I am convinced the CNV calling on Chr7 and Chr16, but the (E σ) and measured standard deviation are very different. How shall I interpretate these two values? And can I use these value to choose right bin size?

Additionally, I didn't understand the upper left corner value. It probably means, I used 500kbp bins and there are 25 segments found in all 24 chromosome(sex Chr included.) But what is this "5k"?

Thanks.

ilarischeinin commented 8 years ago

Please read the QDNAseq paper for discussion on the standard deviations (i.e. noise). The expected one is the theoretical expectations for the amount of sequence data you have, and the measured one is what was observed. A big difference means that your data is noisier than expected, which could be e.g. due to poor DNA quality.

It's difficult to give any specific values for when your sample is good/bad, but the most straightforward way to use the values is to make a noisePlot() of all your samples to identify outliers, i.e. if some sample(s) are noisier (of poorer quality) than others. The paper also includes a plot with 1,000 samples, which you could use to check where your samples fall in that context (note that the noise plots use variances, not standard deviations; whereas the top right corner of profile plots has standard deviations). But just from the numbers in your plot I can say that that sample seems to be really, really noisy.

ilarischeinin commented 8 years ago

"5k" means that you have 5,000 bins (in this case of 500 bp each).