deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
668 stars 205 forks source link

develop: issue with --outFileNameMatrix and --outFileSortedRegions #364

Closed steffenheyne closed 8 years ago

steffenheyne commented 8 years ago

computeMatrix ... --outFileNameMatrix test_matrix.tab gives this

Traceback (most recent call last): File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1158, in savetxt fh.write(asbytes(format % tuple(row) + newline)) TypeError: write() argument must be str, not bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/heyne/install/miniconda3/bin/computeMatrix", line 7, in main() File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/deeptools/computeMatrix.py", line 396, in main hm.save_matrix_values(args.outFileNameMatrix) File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/deeptools/heatmapper.py", line 901, in save_matrix_values np.savetxt(fh, self.matrix.matrix, fmt="%.4g") File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1162, in savetxt % (str(X.dtype), format)) TypeError: Mismatch between array dtype ('float64') and format specifier ('%.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g')

steffenheyne commented 8 years ago

plotHeatmap ... --outFileSortedRegions test_sorted.bed

$ head test_sorted.bed

chrom start end name score strand thickStart thickEnd itemRGB blockCount blockSizes blockStart deepTools_group

11 54971969 54998579 gene_r160 0.0 - 54971969 54971969 0 1 0 26610 DE_oldKO_down 6 149357407 149507379 gene_r800 0.0 + 149357407 149357407 0 1 0 149972 DE_oldKO_down 6 149357407 149507379 gene_r799 0.0 + 149357407 149357407 0 1 0 149972 DE_oldKO_down 3 159502658 159598139 gene_r329 0.0 + 159502658 159502658 0 1 0 95481 DE_oldKO_down

col5 is always 0.0 and file is not sorted

steffenheyne commented 8 years ago

computeMatrix ... --outFileSortedRegions test_sorted.bed

Traceback (most recent call last): File "/home/heyne/install/miniconda3/bin/computeMatrix", line 7, in main() File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/deeptools/computeMatrix.py", line 399, in main hm.save_BED(args.outFileSortedRegions) File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/deeptools/heatmapper.py", line 916, in save_BED starts = region['start'].split(",") TypeError: list indices must be integers or slices, not str

steffenheyne commented 8 years ago

plotheatmap ... --outFileNameMatrix test_matrix.tab

Traceback (most recent call last): File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1158, in savetxt fh.write(asbytes(format % tuple(row) + newline)) TypeError: write() argument must be str, not bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/heyne/install/miniconda3/bin/plotHeatmap", line 7, in main() File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/deeptools/plotHeatmap.py", line 588, in main hm.save_matrix_values(args.outFileNameMatrix) File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/deeptools/heatmapper.py", line 901, in save_matrix_values np.savetxt(fh, self.matrix.matrix, fmt="%.4g") File "/home/heyne/install/miniconda3/lib/python3.5/site-packages/numpy/lib/npyio.py", line 1162, in savetxt % (str(X.dtype), format)) TypeError: Mismatch between array dtype ('float64') and format specifier ('%.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g ...

dpryan79 commented 8 years ago

I'll add that the columns 11 and 12 from --outFileSortedRegions are swapped.

dpryan79 commented 8 years ago

Regarding column 5 in the output of plotHeatmap ... --outFileSortedRegions test_sorted.bed, it will always be 0.0 if either there were no values in the input file or they were all 0.0 to begin with. The value there has absolutely no relation to any computation done by deepTools.

dpryan79 commented 8 years ago

I can't reproduce the computeMatrix ... --outFileNameMatrix test_matrix.tab issue.

dpryan79 commented 8 years ago

I have a branch with computeMatrix ... --outFileSortedRegions test_sorted.bed fixed. It also has the issue that I noted fixed (though I expect this to break something in plotHeatmap/plotProfile, though that should be simple to fix).

dpryan79 commented 8 years ago

I've now also fixed the plotheatmap ... --outFileNameMatrix test_matrix.tab issue. You can test these in the feature/fix_364 branch.

dpryan79 commented 8 years ago

If you can get me a reproducible example of the computeMatrix ... --outFileNameMatrix issue with that then I'll fix it. Otherwise I'll close this before 2.3.0 comes out.

steffenheyne commented 8 years ago

ok, I will test and also look for a reproducer... But could you reproduce that "--outFileSortedRegions" creates a non-sorted bed file, both computeMatrix and plotHeatmap? (...while using "--sortRegions")

dpryan79 commented 8 years ago

For plotHeatmap they appear to be sorted. For computeMatrix they appear to not be (I'm not sure why, I still need to fix that).

dpryan79 commented 8 years ago

The default for --sortRegions differs between computeMatrix (default: no) and plotHeatmap: (default: ascend). If you specify what you want in computeMatrix then it'll do it. I don't know the rationale for the difference in defaults.

steffenheyne commented 8 years ago

These defaults might be ok, just avoid sorting the "raw matrix", but creating a nice and sorted heatmap is ok to some point. But I remember that I got even with "--sortRegions descend" a not properly sorted bed file (don't no if it was computeMatrix or plotHeatmap), at least when I used the bed file in plotHeatmap again and this time with "--sortRegions no" and the heatmap looked not properly sorted...

steffenheyne commented 8 years ago

Fix #364 does not fix:

plotHeatmap -m test_sorted.matrix -out test_pHM.sorted.png --outFileSortedRegions test_sorted.pHM.bed  --sortRegions ascend

computeMatrix reference-point -R test_sorted.pHM.bed -S  /data/manke/repository/deepTools_testDataset/H3K27Me3.bigWig  /data/manke/repository/deepTools_testDataset/H3K4Me1.bigWig -out test_sorted.matrix -b 3000 -a 3000 -bs 100 -p 20 --sortRegions no

plotHeatmap -m test_sorted.matrix -out test_pHM.sorted.png --sortRegions no

sorted bed from plotHeatmap -> computeMatrix with sorted bed, no sorting on matrix -> plotHeatmap with matrix gives "unsorted" heatmap rather then ascending sorted heatmap


and reproducer for the computeMatrix issue (without local paths but on branch #364 ) :

computeMatrix reference-point -R genes19.bed -S  H3K27Me3.bigWig  H3K4Me1.bigWig -out test.matrix -b 3000 -a 3000 -bs 100 -p 20 --outFileSortedRegions test.bed

Traceback (most recent call last):
  File "/home/heyne/install/deepTools/bin/computeMatrix", line 7, in <module>
    main()
  File "/package/deeptools-develop/lib/python3.4/site-packages/deeptools/computeMatrix.py", line 399, in main
    hm.save_BED(args.outFileSortedRegions)
  File "/package/deeptools-develop/lib/python3.4/site-packages/deeptools/heatmapper.py", line 914, in save_BED
    starts = region['start'].split(",")
TypeError: list indices must be integers, not str

computeMatrix with --outFileNameMatrix also not works on #364

computeMatrix reference-point -R genes19.bed -S  H3K27Me3.bigWig  H3K4Me1.bigWig -out test.matrix -b 3000 -a 3000 -bs 100 -p 20 --outFileNameMatrix test.mat

Traceback (most recent call last):
  File "/package/deeptools-develop/lib/python3.4/site-packages/numpy/lib/npyio.py", line 1158, in savetxt
    fh.write(asbytes(format % tuple(row) + newline))
TypeError: must be str, not bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/heyne/install/deepTools/bin/computeMatrix", line 7, in <module>
    main()
  File "/package/deeptools-develop/lib/python3.4/site-packages/deeptools/computeMatrix.py", line 396, in main
    hm.save_matrix_values(args.outFileNameMatrix)
  File "/package/deeptools-develop/lib/python3.4/site-packages/deeptools/heatmapper.py", line 899, in save_matrix_values
    np.savetxt(fh, self.matrix.matrix, fmt="%.4g")
  File "/package/deeptools-develop/lib/python3.4/site-packages/numpy/lib/npyio.py", line 1162, in savetxt
    % (str(X.dtype), format))
TypeError: Mismatch between array dtype ('float64') and format specifier ('%.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g %.4g')

ah, no wonder about the paths "/package/deeptools-develop/...", if I "module unload deeptools/develop" and then run, it prints same error, this time with paths to my installed #364 version...

dpryan79 commented 8 years ago

I was slightly annoyed after the game, so I fixed the --outFileNameMatrix and --outFileSortedRegions issues in python3 and tested the fixed versions in python2.

Regarding the custom sorted BED input to computeMatrix, that's no longer supported. You can sort a BED file (or GTF file, or both) however you want, the only thing that's honored is group boundaries. That that worked previously was due to computeMatrix having its own parallelization code. It made much more sense to unify behind the same code for everything in deepTools, so it's easier to fix bugs and add features. If you need the output sorted in a particular way for some reason then it'd be simpler to make a separate script for that, as I've done in the riboseq branch.