Closed PLStenger closed 4 years ago
The Y axis is whatever is in the bigWig files. So if you made bigWig files contain methylation information then that's what you're seeing.
BTW, you can omit the for loops in bash and instead pass in all of the BED and bigWig files in a single command (-S sample1.bw sample2.bw sample3.bw ... -R regions1.bed regions2.bed ...
).
Thanks a lot Devon for your answer and your advice!
Just to be sure, if I put all the genes (=whole genome) in the bed file and I obtain values between 0-100 assume this is a percentage and your algorithme did an average on the methylation for all genes. But if a subset of the data (and I put few genes in the bed file), and if the Kmeans technique shows peaks with a value more than 2000, that's mean the very few genes in the cluster 1 have 2000 times more methylation than the genes in the other clusters?
Thanks again in advance
What you see in heatmap/profile plot are the values in your bigwig file corresponding to the regions you have in your bed file. This is what Devon also said in his comment. If you have all the genes or some of them as long as there are some genes were you have a large coverage in your bigwig file you would still see a large number on the Y axis. Clustering groups your regions based on the similarity in their coverage, that is why you see the regions with higher coverage cluster together.
Thank you very much @LeilyR for these clarifications! This is more clear now.
I used the V. 3.3.0 version of deeptools (command line, installed from bioconda).
I have Whole Genome Bisulfite data. I run
computeMatrix scale-regions -S
for each file. I specify I didn't used the whole genome here, but only a selection of interesting genes (into filesselect_*.bed
).After I did a
computeMatrixOperations rbind -m
on my each replicate for binding them, and then I did acomputeMatrixOperations cbind -m
on my four sample point.Then I use the Kmeans (=4) technics :
And here I have a huge y axis (between 0 to 3000), because 2 genes drive the cluster 1.
When I delete them in the bed file, I redo analysis and all it's ok.
I specify also when I run analysis on the whole genome without gene selection (so the two genes are in the whole set) I obtain Y value between (0 and 40). Also, it's just an example, but with others kind of subset I also obtain sometimes (but not always) this strange huge y axis range, with others genes.
So, before that I assume that the y axis was a kind of methylation percentage (because I always obtain plots between 0 and 40 for y axis), but now I'm not sure about was really is y axis (if it's not a bug in these command lines). So my question is what is exactly the y axis? And/Or if it's well methylation percentage, is there any bug that could explain this pattern?
Thanks in advance!