XiaoTaoWang / HiCPeaks

A Python implementation for BH-FDR and HiCCUPS
GNU General Public License v3.0
41 stars 9 forks source link

toCooler error because of input txt file #5

Closed clementmo closed 4 years ago

clementmo commented 4 years ago

Hello Xiaotao, I am a postdoctor from HZAU and now is learning data analysis for Hi-C. Recently I am using the HiCPeaks software to transform the raw matrix generated by HiC-pro to cool file. Some problems can't be solved. According to your guidelines, I tried to substract interaction information for chr01 from the raw matrix HPC9_150000.matrix. According to file HPC9_150000_abs.bed , the chr01 is binned to 754 windows. So I generated a file with the code awk '$1<=754&&$2<=754{print}' HPC9_150000.matrix >1_1.txt

head -5 HPC9_150000.matrix 1 1 1599 1 2 577 1 3 117 1 4 103 1 5 68

head -5 HPC9_150000_abs.bed Chr01 0 150000 1 Chr01 150000 300000 2 Chr01 300000 450000 3 Chr01 450000 600000 4 Chr01 600000 750000 5

Then I run toCooler with code toCooler -O HPC9_1.cool -d datasets --nproc 1 --chromsizes-file Ga_1.chromsizes & It generates error "IndexError: index 754 is out of bounds for axis 0 with size 754"

File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/hicpeaks-0.3.4-py3.6.egg/EGG-INFO/scripts/toCooler", line 128, in run
    balance(cooler_uri, nproc=args.nproc)
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/hicpeaks-0.3.4-py3.6.egg/hicpeaks/utilities.py", line 417, in balance
    map=map_)
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/balance.py", line 332, in balance_cooler
    .reduce(add, np.zeros(n_bins))
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/tools.py", line 244, in reduce
    return reduce(binop, iter(self.run()), init)
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/tools.py", line 54, in apply_pipeline
    data = func(chunk, data)
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/balance.py", line 46, in _zero_trans
    mask = chrom_ids[pixels['bin1_id']] != chrom_ids[pixels['bin2_id']]
  File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2149, in __getitem__values=self._codes[key], dtype=self.dtype, fastpath=True
IndexError: index 754 is out of bounds for axis 0 with size 754

I noticed that the number of first two columes in input 1_1.txt file should be smaller than binned chr windows 754, instead of equal or larger than 754.

I tried to analyze the chr02, I used the code awk '$1>=755&&$1<=1415&&$2>=755&&$2<=1415{print}' HPC9_150000.matrix >2_2.txt I replaced 1_1.txt with 2_2.txt under directory ./150K/, then it generated similar errors "IndexError: index 755 is out of bounds for axis 0 with size 661" 661 is the binned number of chr02. How to prepare the input file correctlly?

By the way, should I prepare the chr_chr.txt files for all the chromosomes one by one ? Should I put all these chr_chr.txt files under the same ./150K/ directory ?

I hope you can reply. Thank you so much !!! You can reply through email 1067648804@qq.com if you think it is more convenient.

Best wishes. Pengcheng