broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 587 forks source link

PathSeqPipelineSpark: ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file #6293

Open MengZhang2019 opened 4 years ago

MengZhang2019 commented 4 years ago

Hello everyone, When I use the PathSeqPipelineSpark to analyze my datasets, I meet the next issue: ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-a212bfd7-e23c-4634-977e-979a86afe37f/34/temp_shuffle_f93ccbcf-1ccf-4ba9-a9e9-9b09345b533f My command line just like this: gatk PathSeqPipelineSpark \ --input simulate_200_change.sam \ --kmer-file pathseq_host.bfi \ --filter-bwa-image pathseq_host.fa.img \ --microbe-bwa-image pathseq_microbe.fa.img \ --microbe-fasta pathseq_microbe.fa \ --taxonomy-file pathseq_taxonomy.db \ --min-clipped-read-length 60 \ --min-score-identity 0.90 \ --identity-margin 0.02 \ --scores-output scores.txt \ --output output_reads.bam \ --filter-metrics filter_metrics.txt \ --score-metrics score_metrics.txt I have 5 datasets to run pathseq, 3 of them can Normal operation, but the Two large data sets cannot function normally, and such errors are always prompted. (I have checked the sam files format of 5 datasets. They don't have any errors be found. ) The size of the first three files are: 544MB, 183MB, 914MB. The size of the last two files are 2.14GB, 7.15GB. I am confused of my problem, can you help me. Best and Thank you,

Meng

MengZhang2019 commented 4 years ago

And my server memory size is 300GB。

MinS1 commented 4 years ago

Hi, Meng, I also meet this issue, have you solve it? Thank you

MengZhang2019 commented 4 years ago

Hi, Meng, I also meet this issue, have you solve it? Thank you

Hello, Actually, I didn't solve it completely. When I change to another server, it can run 2.14GB's data well. I think it's because the data is too large, and the server can't perform it normally. If it's not necessary, you can choose other tools.

Best wish Meng

droazen commented 4 years ago

@mwalker174 Any insight into this one?

mwalker174 commented 4 years ago

@MengZhang2019 I have not seen this kind of error before, but the first thing I would do is to set java heap limit using --java-options "-Xmx280g".

Also, what kind of samples are these? PathSeq generally runs better when there are <10M microbial reads in the sample, and large microbe-rich samples can cause issues. Downsampling the bam and omitting --filter-metrics can be helpful in this case.

MinS1 commented 4 years ago

@mwalker174 Hi, I have set java heap limit using --java-options "-Xmx280g", but it shows the same error like "ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file......" My running data are RNAseq data from brain bulk RNA. The size of the input bam file is 2.6G.

MinS1 commented 4 years ago

Hi, Meng, I also meet this issue, have you solve it? Thank you

Hello, Actually, I didn't solve it completely. When I change to another server, it can run 2.14GB's data well. I think it's because the data is too large, and the server can't perform it normally. If it's not necessary, you can choose other tools.

Best wish Meng

Hi @MengZhang2019 Meng, Thank you I also didn't solve this problem. Do you have any tools recommend? I have tried other methods, but the running time was too long, they do not suit large sample data.

mwalker174 commented 4 years ago

@MinS1 @MengZhang2019 Would it be possible for one of you to attach a full log file? Also are you sure there is sufficient disk available? I have a feeling this may relate to temp or memory swap storage.

MinS1 commented 4 years ago

pathseq.log @mwalker174 The attached file is the pathseq log file. My operating environment is Linux and I am sure the disk is available. The command line I used like below shows: nohup /group/LiuLab/User/Smin/software/gatk-4.1.4.1/gatk PathSeqPipelineSpark --input AN00493_ba41_42_22.bam --filter-bwa-image /group/LiuLab/User/Smin/PathSeq/pathseq_host.fa.img --kmer-file /group/LiuLab/User/Smin/PathSeq/pathseq_host.bfi --min-clipped-read-length 31 --microbe-fasta /group/LiuLab/User/Smin/PathSeq/pathseq_microbe.fa --microbe-bwa-image /group/LiuLab/User/Smin/PathSeq/pathseq_microbe.fa.img --taxonomy-file /group/LiuLab/User/Smin/PathSeq/pathseq_taxonomy.db --output AN00493_ba41_42_22.pathseq.bam --scores-output AN00493_ba41_42_22.pathseq.txt >pathseq.log 2>&1 & Thank you~

mwalker174 commented 4 years ago

@MinS1 It looks like this is the root cause:

java.io.FileNotFoundException: /tmp/blockmgr-38b05750-8c08-431f-8c4f-c19ba56bf2bf/09/temp_shuffle_189dde58-8501-47c4-9af7-b1960fa1ab99 (Too many open files)

It's likely the open file limit is set too low in the environment you're using (see article). This would also explain why @MengZhang2019 was able run it successfully on a different server.

MinS1 commented 4 years ago

Hi @mwalker174 Thank you. I re-runned pathseq pipline for different sample based on the same server and parameters. Some are success, some are failed with the same error like before. The size of the data are similar. I have checked my current limit of open files: $cat /proc/sys/fs/file-max 78369196 I think itis enough for running one sample. Finally, I didn't solve this problem.

lbergelson commented 4 years ago

@MinS1 That's the system max files limit, but is it possible there's a lower per-user max?