dieterich-lab / DCC

DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.
https://dieterichlab.org/software/
GNU General Public License v3.0
36 stars 20 forks source link

issue with pybedtools groupby #5

Closed e-hutchins closed 8 years ago

e-hutchins commented 8 years ago

When I run DCC, I get an error with pybedtools groupby:

python2.7 /home/ehutchins/tools/DCC/DCC/main.py @chimeric_junctions -mt1 @chimeric_junctions_mate1 -mt2 @chimeric_junctions_mate2 -D -N -temp -an ensembl_genome/Homo_sapiens.GRCh37.75.gtf -Pi -F -M -R hg19_combined_repeats.gtf -Nr 2 1 -fg -G -A ensembl_genome/Homo_sapiens.GRCh37.75.dna.toplevel.fa

WARNING: nonstrand data, the strand of circRNAs guessed from the strandness of host genes. ===== Please make sure that you mapped both the paired mates togethor and seperately!!! ===== Collect chimera from mates-seperate mapping. started detect circRNA from Sample_S_001_PC02_ACTTGAChimeric.out.junction.fixed started detect circRNA from Sample_S_002_PC02_TTAGGCChimeric.out.junction.fixed started detect circRNA from Sample_S_007_PC02_GATCAGChimeric.out.junction.fixed started detect circRNA from Sample_S_023_PC02_CTTGTAChimeric.out.junction.fixed started detect circRNA from Sample_S_024_PC02_ACAGTGChimeric.out.junction.fixed started detect circRNA from Sample_S_170_PC02_CGATGTChimeric.out.junction.fixed

Start to combine individual circRNA read counts.

Traceback (most recent call last): File "/home/ehutchins/tools/DCC/DCC/main.py", line 507, in main() File "/home/ehutchins/tools/DCC/DCC/main.py", line 244, in main circAnn.annotate('_tmp_DCC/tmp_coordinates','_tmpDCC/tmp'+getfilename(options.annotate)+'.gene','_tmp_DCC/tmp_coordinatesannotated') File "/home/ehutchins/tools/DCC/DCC/circAnnotate.py", line 54, in annotate tmpresult = tmpintersect.groupby(g=(1,2,3,5),c=(ncol+7,ncol+9),o=('first','distinct')) File "/home/ehutchins/.local/lib/python2.7/site-packages/pybedtools-0.7.1-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 775, in decorated result = method(self, _args, *_kwargs) File "/home/ehutchins/.local/lib/python2.7/site-packages/pybedtools-0.7.1-py2.7-linux-x86_64.egg/pybedtools/bedtool.py", line 336, in wrapped decode_output=decode_output, File "/home/ehutchins/.local/lib/python2.7/site-packages/pybedtools-0.7.1-py2.7-linux-x86_64.egg/pybedtools/helpers.py", line 378, in call_bedtools raise BEDToolsError(subprocess.list2cmdline(cmds), stderr) pybedtools.helpers.BEDToolsError: Command was:

bedtools groupby -o first,distinct -i /tmp/pybedtools.kYcZIv.tmp -g 1,2,3,5 -c 13,15

Error message was:


***\ ERROR: Requested column 13, but database file /tmp/pybedtools.kYcZIv.tmp only has fields 1 - 0.

Tool: bedtools groupby Version: v2.24.0-34-ge542de7 Summary: Summarizes a dataset column based upon common column groupings. Akin to the SQL "group by" command.

Usage: bedtools groupby -g [group_column(s)] -c [op_column(s)] -o [ops] cat [FILE] | bedtools groupby -g [group_column(s)] -c [op_column(s)] -o [ops]

s6juncheng commented 8 years ago

Hi,

thanks for interested in DCC and report the issue.

could you please show me the following things:

  1. A few lines of Sample_S_001_PC02_ACTTGAChimeric.out.junction.fixed in your sample directory.
  2. A few lines of any *.circRNA file in _tmp_DCC directory?
  3. A ll command result of _tmp_DCC directory, I'm interested in the file sizes.
  4. might also good if you show few lines of tmp_DCC/tmp_coordinates, thanks.

Thanks again.

Best, Jun

s6juncheng commented 8 years ago

my email s6juncheng@gmail.com

e-hutchins commented 8 years ago

Great, thanks for the fast response! I sent you the info via email.

e-hutchins commented 8 years ago

Thanks for the help! Fixed error by using bedtools v2.24.0.