gersteinlab / starrpeaker

STARRPeaker: STARR-seq peak caller
GNU General Public License v3.0
15 stars 7 forks source link

Encountered some errors: RuntimeError, RuntimeWarning, IndexError #10

Closed tyler-hansen closed 3 years ago

tyler-hansen commented 3 years ago

Hi all,

Cool package! I tried running this package on my STARR-seq data and got a few errors that I thought would be helpful to report. My python knowledge is not so great and so its been difficult to troubleshoot. In other words it could be on my end. Any help on how to fix would be greatly appreciated.

I'm running within a conda environment on a CentOS cluster. 1 core, 16GB RAM (this can be dramatically expanded if needed). I tried it with and without the covariate files.

With Covariate files: My code: starrpeaker --prefix ${RESULTS_DIR}/GM12878inGM12878_default_+cov --chromsize $CHROM --blacklist $BLACKLIST_FILE -i ${DATA_DIR}/GM12878inGM12878_DNA_merged.unique.pos-sorted.bam -o ${DATA_DIR}/GM12878inGM12878_RNA_merged.unique.pos-sorted.bam --cov ${COV_DIR}/STARRPeaker_cov_GRCh38_gem-mappability-100mer.bw ${COV_DIR}/STARRPeaker_cov_GRCh38_linearfold-folding-energy-100bp.bw ${COV_DIR}/STARRPeaker_cov_GRCh38_ucsc-gc-5bp.bw

Error: [2020-12-11 11:29:29] Making bins [2020-12-11 11:31:01] Filtering blacklist region [2020-12-11 11:36:48] Done [2020-12-11 11:36:48] Averaging features per bin [2020-12-11 11:36:52] Processing /data/hodges_lab/public_data/STARRPeaker_Covariates/STARRPeaker_cov_GRCh38_gem-mappability-100mer.bw Traceback (most recent call last): File "/home/hansetj1/.conda/envs/STARRPeaker/bin/starrpeaker", line 8, in sys.exit(main()) File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/starrpeaker/starrpeaker.py", line 54, in main bwFiles=args.cov) File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/starrpeaker/core.py", line 109, in proc_cov val = b.stats(chr, int(start), int(end), type="mean") RuntimeError: Invalid interval bounds!

Without Covariate files: My code: starrpeaker --prefix ${RESULTS_DIR}/GM12878inGM12878_default_-cov --chromsize $CHROM --blacklist $BLACKLIST_FILE -i ${DATA_DIR}/GM12878inGM12878_DNA_merged.unique.pos-sorted.bam -o ${DATA_DIR}/GM12878inGM12878_RNA_merged.unique.pos-sorted.bam

Two Errors: [2020-12-11 11:46:24] Making bins [2020-12-11 11:47:57] Filtering blacklist region [2020-12-11 11:53:34] Done [2020-12-11 11:53:34] Counting feature depth (template or read) per genomic bin /data/hodges_lab/ATAC-STARR_V2/data/STARRPeaker_results/GM12878inGM12878default-cov.bin.bed [2020-12-11 11:53:34] Using fragment center [2020-12-11 11:53:40] Processing input alignment file /data/hodges_lab/ATAC-STARR_V2/data/bams/merged_bams/GM12878inGM12878_DNA_merged.unique.pos-sorted.bam [2020-12-11 11:53:40] Parallel processing using 32 cores [2020-12-11 11:56:34] 80,062,748 total mapped reads [2020-12-11 11:56:34] Counting fragments mapped to both forward (+) and reverse (-) strand [2020-12-11 11:56:34] 40,031,374 templates extracted [2020-12-11 11:56:34] 0 templates used for count [2020-12-11 11:56:34] Merging fragment bed files [2020-12-11 11:56:34] Making genome coverage bedGraph [2020-12-11 11:56:58] Converting bedGraph to bigWig [2020-12-11 11:56:58] Writing /data/hodges_lab/ATAC-STARR_V2/data/STARRPeaker_results/GM12878inGM12878default-cov.input.bw [2020-12-11 11:56:58] Merging count bed files [2020-12-11 11:56:59] Counting depth per bin [2020-12-11 12:03:56] Processing output alignment file /data/hodges_lab/ATAC-STARR_V2/data/bams/merged_bams/GM12878inGM12878_RNA_merged.unique.pos-sorted.bam [2020-12-11 12:03:56] Parallel processing using 32 cores /home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/starrpeaker/core.py:462: RuntimeWarning: invalid value encountered in long_scalars print("[%s] Normalizing factor for input: %f" % (timestamp(), (tct[1] / tct[0]))) /home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/starrpeaker/core.py:463: RuntimeWarning: invalid value encountered in long_scalars normalized_input = mat[:, 0] * (tct[1] / tct[0]) [2020-12-11 12:05:32] 32,489,848 total mapped reads [2020-12-11 12:05:32] Counting fragments mapped to both forward (+) and reverse (-) strand [2020-12-11 12:05:32] 16,244,924 templates extracted [2020-12-11 12:05:32] 0 templates used for count [2020-12-11 12:05:32] Merging fragment bed files [2020-12-11 12:05:32] Making genome coverage bedGraph [2020-12-11 12:05:56] Converting bedGraph to bigWig [2020-12-11 12:05:56] Writing /data/hodges_lab/ATAC-STARR_V2/data/STARRPeaker_results/GM12878inGM12878default-cov.output.bw [2020-12-11 12:05:56] Merging count bed files [2020-12-11 12:05:56] Counting depth per bin [2020-12-11 12:13:05] Normalizing factor for input: nan [2020-12-11 12:16:20] Done [2020-12-11 12:16:20] Calling peaks [2020-12-11 12:16:20] Loading fragment coverages [2020-12-11 12:22:48] Running without covariates Traceback (most recent call last): File "/home/hansetj1/.conda/envs/STARRPeaker/bin/starrpeaker", line 8, in sys.exit(main()) File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/starrpeaker/starrpeaker.py", line 89, in main extQuantile=args.eq) File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/starrpeaker/core.py", line 589, in call_peak maxInput = np.quantile(bct_i[bct_i > 0], (1 - extQuantile)) File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3818, in quantile a, q, axis, out, overwrite_input, interpolation, keepdims) File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3826, in _quantile_unchecked interpolation=interpolation) File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3405, in _ureduce r = func(a, kwargs) File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3941, in _quantile_ureduce_func x1 = take(ap, indices_below, axis=axis) weights_below File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 189, in take return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode) File "/home/hansetj1/.conda/envs/STARRPeaker/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc return getattr(obj, method)(args, kwds) IndexError: cannot do a non-empty take from an empty axes. [2020-12-11 12:03:56] Processing chr12_KI270904v1_alt [2020-12-11 12:03:59] Finished processing chr12_KI270904v1_alt [2020-12-11 12:03:59] Processing chr4_KI270925v1_alt ...

hoondy commented 3 years ago

Thanks for your interest. I noticed all fragments were filtered out in the earlier fragment count step, leaving you with 0 count matrix for the subsequent peak calling step and caused the error.

I'd suggest playing with parameters (--min --max --mincov) to suit your STARR-seq data.

[2020-12-11 11:56:34] 40,031,374 templates extracted [2020-12-11 11:56:34] 0 templates used for count

hoondy commented 3 years ago

@tyler-hansen Hope you were able to troubleshoot the issue. If you're still facing any issue, please reopen this ticket.

tyler-hansen commented 3 years ago

@hoondy I'm still working on it, but I found that the primary issue was a scripting error I made in the bam merge step that produced a bad file. I have since fixed that but ran into another issue with the covariate files being out of bounds. It turns out I fixed that by using your chrom.sizes file rather than the mine, which has worked for me in the past. Also, I had to use your blacklist file as mine kept giving an error.

Altogether, I seem to have it working now and its just a matter of playing around with the parameters like you said above. Thank you for your help.