gersteinlab / starrpeaker

STARRPeaker: STARR-seq peak caller
GNU General Public License v3.0
15 stars 7 forks source link

IndexError: cannot do a non-empty take from an empty axes. #7

Closed cxshine closed 4 years ago

cxshine commented 4 years ago

Sorry, when I used starrpeaker to process my single-end bam files, this problem always comes up. image the command: nohup starrpeaker --prefix N1 --chromsize $hg19_chormsize --blacklist $hg19_blacklist --cov $hg19_gc $hg19_mappability $hg19_linearfold --input input.bam --output output.bam --readstart --threshold 0.05 &

hoondy commented 4 years ago

This indicates somethings not right about “bct” file generated in earlier step**. This file stores fragment-count (read-count in your case) per bin. I would examine that file and make sure there are exactly three columns.

**This index error was invoked while modeling STARR-seq data after various files have been processed and loaded onto memory.

cxshine commented 4 years ago

Thanks for your response, this problem happens only when starrpeaker processes single-end bam files.

hoondy commented 4 years ago

The code may need to be updated if it’s pertinent to single-end. I can look into this for the next version. Could you share a subset of bam files (ie, extract chr22?)

cxshine commented 4 years ago

Sorry for the late reply, from this link you can download the single-end bam files.

hoondy commented 4 years ago

The error indeed came from single-end data being not properly recognized during bam processing.

I updated the code and added a new option "--se" to accommodate the single-end data. I tested on your sample bam files and they worked well. Try the latest version and let me know how it goes.

cxshine commented 4 years ago

I sincerely appreciate your help. The latest version Starrpeaker runs successfully. Also, I have two questions. The first one, How to set the argument of min/max? I have no idea about it. The second one, the number of final peak is a few, is that normal ?

hoondy commented 4 years ago

I am glad I can help. Thanks for helping me to improve the software.

For your question, min/max settings are only relevant to paired-end and they are set based on the size of fragments. If you have '--se' or '--readstart' option, min/max setting is not used. If you have paired-end data, you can estimate the size of fragments based on orientation and position of fwd and rev reads (see https://www.ecseq.com/support/ngs/why-do-the-reads-all-have-the-same-length-when-sequencing-differently-sized-fragments for more). If you specify you'd want to retain 200bp-1000bp fragments (--min 200 --max 1000), anything that falls outside will get discarded.

The final peak is based on the FDR threshold of 0.05 and the number depends on types of experiments. I find peaks from whole-genome STARR-seq anywhere from 1ks to 100ks. If you find too few peaks, try relaxing the threshold and manually curating peaks using visualization tools.