Closed bioinfouser closed 4 years ago
One probable solution could be, in DESeq2, feeding the total mapped readcount numbers as an extra column and keep it as continuous variable, and incorporate that column in design matrix. So that the raw counts would have the normalisation affect from BAM library size also. Does it sound logical?
I missed this issue. I think either using the total library size (from bam file) or using only the reads fall into peaks (DESeq2) should be fine. I have not tested that. If you do both and compare, do you see big differences? especially for discrepant differential peaks, you may want to visualize in IGV and judge yourself.
Dear Tommy,
I am an avid follower of you ChIPseq tutorials and blogs. I am now looking at part 3 of your tutorial, which is, using Diffbind or DESeq2 for analysis. I was using DiffBind until now but want to switch to deseq2 as I want to control for multiple covariates, which is not currently offered by DiffBind package (only one blocking factor at a time). I have the raw counts generated and I can do the full analysis, however, I have a question regarding library size normalisation .DiffBind by default, does it. DESeq2 vignette also suggests, it does library size normalisation. But the difference that I find is, Diffbind takes the library size information from the BAM files and uses that, which is probably total mapped reada. In terms of DESeq2, since it doesn't have the bams, it probably do the colum wise sample read count sum to get the library size. Fundamentally, will they be different or not? My main point is, can I trust the deseq2 library size normalisation method as opposed to Diffbind way of library size normalisation?