bioinform / neusomatic

NeuSomatic: Deep convolutional neural networks for accurate somatic mutation detection
Other
167 stars 51 forks source link

scan_alignments using up all remaining disk space (>100GB) and fails #32

Closed kiranchari closed 5 years ago

kiranchari commented 5 years ago

Hi,

I am trying to run preprocess.py in call mode on a whole genome sample. The target regions .bed file I used is resources/hg19.bed. The work_call directory uses up a lot of space until it runs out of all space of the storage device ( > 100GB). Is this expected and how much space is the scan_alignments step expected to consume? Thank you.

msahraeian commented 5 years ago

@kiranchari happy to see your interest in NeuSomatic. How big is your input bam file? For me, a 100X WGS data may need ~80G of total storage. (the tumor and normal bams are ~160G each).

kiranchari commented 5 years ago

thanks for your response @msahraeian

My tumor and normal bams are about 90GB and 100GB respectively

msahraeian commented 5 years ago

@kiranchari So I guess for this bam size, you should not need >100GB for work_call directory. Can you share with me the work_call/work_tumor/work.0/scan.err log file?

kiranchari commented 5 years ago

I deleted the working directory because it got too large, but I've pasted the error output from the console. Looked like this error occurred in all the split_regions

ERROR 2019-03-08 13:54:23,395 run_scan_alignments (ForkPoolWorker-6) Command '['../bin/scan_alignments', '--ref', 'GRCh37.fa', '-b', 'Tumor.bam', '-L', 'neusomatic_wdir/work_tumor/region_55.bed', '--out_vcf_file', 'neusomatic_wdir/work_tumor/work.55/candidates.vcf', '--out_count_file', 'neusomatic_wdir/work_tumor/work.55/count.bed', '--window_size', '2000', '--min_af', '0.01', '--min_mapq', '10', '--max_depth', '40000', '--num_thread', '1', '--calculate_qual_stat']' died with <Signals.SIGSEGV: 11>.

ERROR 2019-03-08 13:54:23,396 run_scan_alignments (ForkPoolWorker-6) [Errno 28] No space left on device: 'neusomatic_wdir/work_tumor/work.55'

msahraeian commented 5 years ago

@kiranchari Can you try running it on a small region and share with me the work_tumor/work.0/scan.err file?

kiranchari commented 5 years ago

@msahraeian it runs with no errors when I run it on a small region

msahraeian commented 5 years ago

@kiranchari For now, I guess if you don'y have sufficient disk space, you may run the whole process (preprocess+call+postprocess) in multiple batches (like per chromosome) and remove intermediate files after each run. BTW, please make sure you have .bam.bai index files in the same folder as your tumor and normal bams.

kiranchari commented 5 years ago

ok, shall I close this issue?