Open ting-hsuan-chen opened 2 days ago
Update: I compared the two runs for the same genome. The first run was given 50G of RAM but failed with OOM. The second run was given 60G of RAM but completed. And I found that the size of the ouput files (especially the TE library and bed/gff files) in the summary folder was larger than those of the OOM run. So I guess I'll need to rerun earlGrey for the failed genome.
The file content in the summary folder of OOM run:
total 176712
-rw-rw-r--. 1 cflthc powerplant 7630 Sep 22 13:19 01.01_red5_v2_classification_landscape.pdf
-rw-rw-r--. 1 cflthc powerplant 619087 Sep 22 13:19 01.01_red5_v2_divergence_summary_table.tsv
-rw-rw-r--. 1 cflthc powerplant 5795045 Sep 22 13:19 01.01_red5_v2-families.fa.strained
-rw-rw-r--. 1 cflthc powerplant 303431 Sep 22 09:18 01.01_red5_v2.familyLevelCount.txt
-rw-rw-r--. 1 cflthc powerplant 37979930 Sep 22 13:19 01.01_red5_v2.filteredRepeats.bed
-rw-rw-r--. 1 cflthc powerplant 112088923 Sep 22 13:19 01.01_red5_v2.filteredRepeats.gff
-rw-rw-r--. 1 cflthc powerplant 489 Sep 22 09:18 01.01_red5_v2.highLevelCount.txt
-rw-rw-r--. 1 cflthc powerplant 8524 Sep 22 13:19 01.01_red5_v2_split_class_landscape.pdf
-rw-rw-r--. 1 cflthc powerplant 7878 Sep 22 09:18 01.01_red5_v2.summaryPie.pdf
-rw-rw-r--. 1 cflthc powerplant 12786 Sep 22 13:19 01.01_red5_v2_superfamily_div_plot.pdf
The file content in the summary folder of completed run:
total 175776
-rw-rw-r--. 1 cflthc powerplant 7674 Sep 25 21:23 01.01_red5_v2_classification_landscape.pdf
-rw-rw-r--. 1 cflthc powerplant 623389 Sep 25 21:23 01.01_red5_v2_divergence_summary_table.tsv
-rw-rw-r--. 1 cflthc powerplant 6183721 Sep 25 21:23 01.01_red5_v2-families.fa.strained
-rw-rw-r--. 1 cflthc powerplant 305545 Sep 25 16:16 01.01_red5_v2.familyLevelCount.txt
-rw-rw-r--. 1 cflthc powerplant 37714153 Sep 25 21:23 01.01_red5_v2.filteredRepeats.bed
-rw-rw-r--. 1 cflthc powerplant 111215427 Sep 25 21:23 01.01_red5_v2.filteredRepeats.gff
-rw-rw-r--. 1 cflthc powerplant 489 Sep 25 16:16 01.01_red5_v2.highLevelCount.txt
-rw-rw-r--. 1 cflthc powerplant 8563 Sep 25 21:23 01.01_red5_v2_split_class_landscape.pdf
-rw-rw-r--. 1 cflthc powerplant 7880 Sep 25 16:16 01.01_red5_v2.summaryPie.pdf
-rw-rw-r--. 1 cflthc powerplant 12335 Sep 25 21:23 01.01_red5_v2_superfamily_div_plot.pdf
Is there a way to resume earlGrey from where it failed?
Hi @ting-hsuan-chen!
In this case it is likely that the OOM step prevented proper processing during the divergence calculations, where the annotations are read into memory to calculate kimura divergence. It is probably worth rerunning these jobs just to make sure.
You can rerun the failed steps of EarlGrey here by deleting ${OUTDIR}/${species}_mergedRepeats/looseMerge/${species}.filteredRepeats.bed
, then resubmitting the job with exactly the same command line options as before. EarlGrey will then skip stages that are successfully completed, so in this case it should only rerun the defragmentation step and divergence calculations
Thank you @TobyBaril, I'll try it.
Hello!
I ran EarlGrey (v4.4.4) for multiple genomes (size between 500-600 Mb) using Slurm. Some jobs were completed but the others showed Out Of Memory (exit code 0).
For those OOM jobs, I checked the log file generated by earlGrey and it seemed that the pipeline has completed. Like the following:
And the number of files in the summary folder is the same as those genomes with completed run.
What would be the cause of the OOM error? Which step is the most RAM-consuming step? Should I rerun EarlGrey for those having OOM error or ignore the OOM error? Or would it be a problem caused by our Slurm system instead?
p.s. I used 16 cores and 60G of RAM for each job.
Any guidance is much appreciated.
Cheers Ting-Hsuan