deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
685 stars 213 forks source link

Problem with inconsistent lines output from K-means clustering #1253

Open magnusdottir opened 1 year ago

magnusdottir commented 1 year ago

Hi, I'm having problems with plotHeatmap and K-means clustering. I am not able to reproducibly plot heatmaps, as the numbers of lines in them varies between runs, even when giving the exact same command. Somehow it seems like the initial runs have the expected line density but then subsequent runs don't, which is strange as I'm running scripts on SLURM. I do get a high number of lines in unclustered heat maps, and fewer clusters tend to perform "better" in terms of including the full number of lines, but this is still erratic.

E.g. I have two data points that I ran a matrix for and plotted a heat map. This looked good without clustering and then good as well with two clusters as well as three clusters but four clusters gave me what looks like a much lower density (in terms of lines) heat map.

I was outputting .pdf files and started thinking this might be something to do with how the program outputs/plots pdfs. I therefore ran the exact same script with a .png output and it gave the more dense heat map (i.e. what appears to have the same total line density as the original heat map and the 2 cluster heatmap). But then increasing to 6 clusters gave me a coarser heatmap again, and THEN going back to 4 clusters, still WITH .png gave me the less dense heatmap again. The only thing I've changed in the below script between runs is the --kmeans cluster number and the file name.


Python: Python/3.9.6-GCCcore-11.2.0 deepTools: deepTools/3.5.1-foss-2021b

This is my code with the file names changed that gave the different results when run two different times on the same matrix:

plotHeatmap     -m $outPath/Matrix/Matrix_TSS_2Kb \
                -out $outPath/Plots/TSS_2Kb_4Clusters.png \
                --colorMap RdBu \
                --whatToShow 'heatmap and colorbar' \
                --zMin -3 --zMax 3 \
                --kmeans 4

This is the top of the clustered heat map that I get, with the right hand side plot seeming to be a lot sparser in terms of number of data points: image

Has anyone had a similar problem?