GreenleafLab / NucleoATAC

nucleosome calling using ATAC-seq
MIT License
106 stars 31 forks source link

IndexError: arrays used as indices must be of integer (or boolean) type #25

Closed ShaluJhanwar closed 9 years ago

ShaluJhanwar commented 9 years ago

Hi,

I'm trying to use NucleoATAC on our in-house generated mouse ATAC-seq data, but I found below error even with different runs on different files. I'm afraid is it because I am using default (Human) PWM for mouse as well? The fact is that chr1 3640097 3640219 entry (see below) doesn't even exist in bed peak file. Could you help solving this? Thanks! Error: Command run: /users/GD/tools/NucleoATAC/NucleoATAC/bin/nucleoatac run --bed /no_backup/so/sjhanwar/WGBS_mouse/NucleoATAC/WNN1/WNN1_6640_fseq_peaks.bed --bam /no_backup/so/sjhanwar/WGBS_mouse/NucleoATAC/WNN1/WNN1_6640_AGGCAGA.sort.bam --fasta /users/GD/resource/mouse/mm9/full/mm9_NCBI73.fasta --out WNN1 nucleoatac version 0.3.0 start run at: 2015-10-26 17:12 ---------Step1: Computing Occupancy and Nucleosomal Insert Distribution--------- /software/so/el6.3/PythonPackages-2.7.6/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2641: VisibleDeprecationWarning: rank is deprecated; use the ndim attribute or function instead. To find the rank of a matrix see numpy.linalg.matrix_rank. VisibleDeprecationWarning) Caught exception when processing: chr1 3640097 3640219 None None +

Traceback (most recent call last): File "/software/so/el6.3/PythonPackages-2.7.6/lib/python2.7/site-packages/nucleoatac/run_occ.py", line 30, in _occHelper occ.process(params) File "/software/so/el6.3/PythonPackages-2.7.6/lib/python2.7/site-packages/nucleoatac/Occupancy.py", line 230, in process self.callPeaks() File "/software/so/el6.3/PythonPackages-2.7.6/lib/python2.7/site-packages/nucleoatac/Occupancy.py", line 209, in callPeaks peaks = call_peaks(self.occ.smoothed_vals, sep = self.params.sep, min_signal = self.params.min_occ) File "/software/so/el6.3/PythonPackages-2.7.6/lib/python2.7/site-packages/pyatac/utils.py", line 98, in call_peaks peaks = peaks[sigvals[peaks] >= min_signal ] IndexError: arrays used as indices must be of integer (or boolean) type () Traceback (most recent call last): File "/users/GD/tools/NucleoATAC/NucleoATAC/bin/nucleoatac", line 291, in main() File "/users/GD/tools/NucleoATAC/NucleoATAC/bin/nucleoatac", line 75, in main run_occ(occ_args) File "/software/so/el6.3/PythonPackages-2.7.6/lib/python2.7/site-packages/nucleoatac/run_occ.py", line 119, in run_occ tmp = pool1.map(_occHelper, zip(j,itertools.repeat(params))) File "/software/so/el6.3/PythonPackages-2.7.6/lib/python2.7/multiprocessing/pool.py", line 250, in map return self.map_async(func, iterable, chunksize).get() File "/software/so/el6.3/PythonPackages-2.7.6/lib/python2.7/multiprocessing/pool.py", line 554, in get raise self._value IndexError: arrays used as indices must be of integer (or boolean) type

ShaluJhanwar commented 9 years ago

NucleoATAC is working fine on example files. No idea what's wrong when I run it on my own data. BAM files are sorted. All the requirements are also satisfied. numpy.version 1.10.1 scipy.version 0.13.2 matplotlib.version 1.3.1 cython.version 0.23.4 pysam.version 0.8.3

AliciaSchep commented 9 years ago

Hi, the issue I think is with the scipy version-- can you try upgrading (to version 0.16.0 or greater) to see if it fixes issue? Someone else had this same error and updating scipy fixed the issue, and I meant to add the version of scipy to the setup.py script but apparently have not done so yet. It's definitely not an issue with the PWM. The entry won't exist as-is in your bedfile because there is some adjustment to peaks (extending slightly, merging overlapping peaks).

ShaluJhanwar commented 9 years ago

Hi Alicia, thanks for the reply. It's already approx. 5 hours on 5 processes, but one run of NucleoATAC for the mouse genome didn't yet finished. The output files are still growing. Could you please let me know how much time one run would take so that we can be sure if NucleoATAC is working. ii) There are different output generated e.g. occ* files, pos files etc. After reading Nucleosome paper, could you please confirm: a) I guess *nucpos.bed.gz is the file which will provide me nucleosome position? There are more than 3 columns, from where I can get their header information? b) Do I need to filter nucleosome positions data based on some cut-offs? What would you suggest?

Many thanks! Shalu

AliciaSchep commented 9 years ago

Yes *.nucpos.bed.gz is nucleosome positions. I would suggest to filter the positions based on the occupancy column. There are already cutoffs for z-score (can be changed by using the individual nucleoatac commands instead of 'run').

As for time, depends on how much of the genome your bed file covers...

ShaluJhanwar commented 9 years ago

Hi Alicia, After scipy version adding, one run of NucleoATAC finished with following: Command run: /users/GD/tools/NucleoATAC/NucleoATAC/bin/nucleoatac run --bed /no_backup/so/sjhanwar/WGBS_mouse/NucleoATAC/WNN1/WNN1_6640_fseq_peaks.bed --bam /no_backup/so/sjhanwar/WGBS_mouse/NucleoATAC/WNN1/WNN1_6640_AGGCAGA.sort.bam --fasta /users/GD/resource/mouse/mm9/full/mm9_NCBI73.fasta --out WNN1 --cores 8 nucleoatac version 0.3.0 start run at: 2015-10-28 08:50 ---------Step1: Computing Occupancy and Nucleosomal Insert Distribution--------- Making figure ---------Step2: Processing Vplot------------------------------------------------ /software/so/el6.3/PythonPackages-2.7.6/lib/python2.7/site-packages/matplotlib/collections.py:548: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison if self._edgecolors == 'face': ---------Step3: Obtaining nucleosome signal and calling positions--------------- ---------Step4: Making combined nucleosome position map ------------------------ ---------Step5: Calling NFR positions------------------------------------------- end run at: 2015-10-28 15:09

real 379m11.033s user 1013m29.183s sys 11m46.624s

i) Could we remove Step2 warning/error? ii) In the *.nucpos.bed.gz, there are 19390 nucleosomes with occupancy (5th column) ranged from 0 to 1. Which cutoff of occupancy would you suggest to filter nucleosome? iii) Z-score is ranging from 3 to 346. Should I consider all the nucleosomes with higher Z-score too? Thanks! Shalu

AliciaSchep commented 9 years ago

i) I need to figure out what that warning is, I think it started occurring with a recent version of matplotlib. ii) I would suggest a fairly liberal threshold around 0.1, but might be worth looking at histogram and/or relationship with other metrics iii) default Z-score cutoff is 3. depending on whether you want more confidence at the calls (at the expense of fewer calls) you can choose a higher threshold

ShaluJhanwar commented 9 years ago

Hi Alicia, Since NucleoATAC provided a single dyad position in the output. I want to plot the number of reads supported for each nucleosome position (Basically a bigwig file for visualisation purpose). We have only single bp of the dyad. Could you please suggest: i) Should I consider some flanking region of the nucleosome ii) If yes, how many bp flanking can provide an approximation of the actual nucleosome peak iii) Can we get actual peak size of the nucleosome? Thanks! Shalu