dieterich-lab / DCC

DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.
https://dieterichlab.org/software/
GNU General Public License v3.0
36 stars 20 forks source link

MemoryError #84

Closed ashokpatowary closed 4 years ago

ashokpatowary commented 4 years ago

Hi @tjakobi

I am trying to use DCC in a large dataset. For testing I ran it with ~50 samples and it ran perfect. Now while running with all the ~600 samples I am encountering memory error. Is there any tested way to proceed with large number of samples. Bellow is my command line and error.

Thanks

Traceback (most recent call last):
  File "/u/home/.local/bin/DCC", line 11, in <module>
    load_entry_point('DCC==0.4.8', 'console_scripts', 'DCC')()
  File "build/bdist.linux-x86_64/egg/DCC/main.py", line 365, in main
  File "build/bdist.linux-x86_64/egg/DCC/circFilter.py", line 59, in readcirc
MemoryError
DCC @samplesheet -mt1 @mate1 -mt2 @mate2 -D -R /u/home/Resource/RepeatMasker.SimpleRepeat.gtf \
-an /u/home/Resource/gencode.v33lift37.annotation.gtf -Pi -F -M -Nr 5 10 -fg -G -A \
/u/home/Resource/GRCh37.primary_assembly.genome.fa -T 16 -B @bamlist
tjakobi commented 4 years ago

Hi @ashokpatowary,

Thank you for using DCC. Running 600 samples might indeed required a huge amount of RAM. On what type of machine are you running this analysis? You might want to split the sample over multiple runs. See https://github.com/dieterich-lab/DCC/issues/83#issuecomment-651286186 for more information on that.

Cheers, Tobias

ashokpatowary commented 4 years ago

Thanks you @tjakobi for your response. I will follow the comment from the other issue. I am using sge machine with 132g exclusive memory.

Thanks