Closed asan-emirsaleh closed 1 year ago
Hi @asan-emirsaleh,
Wow, 12 days running time is indeed abnormal and there is clearly an issue here. In general, indexing is the most time and memory consuming step of Ratatosk and it is the only step that cannot be split between jobs. I would say that the issue in https://github.com/DecodeGenetics/Ratatosk/issues/35 is unrelated to yours since the problem there was that a very high coverage of short reads was used (170x) for the indexing which was making the running time exceedingly long. Also, since that issue, Ratatosk got a lot faster and now, for 60x coverage PE reads of a human genome, I wouldn't expect the indexing to go beyond 1000 CPU hours. You have only 50-60x coverage so coverage is not the issue here. I haven't had issues with that version of Ratatosk for as long as I can remember now.
However, looking at the output of your sacct
command, I see that NCPUS=1
but you said you have allocated 120 threads to this job. In general, "CPU" and "core" means two different things but in the context of Slurm, I believe they mean the same. Or they might mean "CPU cores". Point is, I think that NCPUS
should be equal to 120 here. Is it possible that Ratatosk was run with a single thread rather than 120, which would explain the very long running time? Or that you got a one thread job from Slurm but tried to use 120 which would also result in a long running time?
Guillaume
Hi @GuillaumeHolley
Yes you are right. The issue seems be related to thread allocation. Normally I define threads directly in script (not in workload manager params) and SLURM counts how many threads I request by analyzing the script code. In case with Ratatosk it couldn't understand my Ratatosk index -c 120
command. So I reran the script with direct allocation of CPU threads as params
#SBATCH --mem 500GB
#SBATCH -c 122
the sacct
output were different but computation appeared to be long too
$sacct -j {JobId} --format=CPUTimeRAW,Elapsed,NCPUS,State,AveVMSize,ReqCPUs,ReqMem
CPUTimeRAW Elapsed NCPUS State AveVMSize ReqCPUS ReqMem
---------- ---------- ---------- ---------- ---------- -------- ----------
15241216 1-10:42:08 122 RUNNING 122 500Gn
$tail slurm-{JobId}.out
Ratatosk::addCoverage(): Processed 50000000 reads.
Ratatosk::addCoverage(): Processed 51000000 reads.
Ratatosk::addCoverage(): Processed 52000000 reads.
Ratatosk::addCoverage(): Processed 53000000 reads.
Ratatosk::addCoverage(): Processed 54000000 reads.
Ratatosk::addCoverage(): Processed 55000000 reads.
Ratatosk::addCoverage(): Processed 56000000 reads.
Ratatosk::addCoverage(): Processed 57000000 reads.
Ratatosk::addCoverage(): Processed 58000000 reads.
Ratatosk::addCoverage(): Processed 59000000 reads.
So I am still waiting for job to complete. As you can see, NCPUs
differs (in relation to the time consumed) but the indexing processes very slow. If there be some time checkpoints in the logs it were easy to compare two outputs, for example I would have seen how much time at every subsequent stage been consumed.
I want to ask you again, is there any way to split the indexing step onto consequent jobs, for example "building graph", "splitting unitigs", "joining unitigs", "coloring graph" for script be able to resume after it has been crashed? I am not familiar with C++ so I cannot propose a solution.
Hi @asan-emirsaleh,
Glad to hear there is a little bit of progress here. As I said before, the indexing stage is the only step of Ratatosk that cannot be split. There is currently no way of doing it. I am not saying it is impossible to implement but it would require some heavy work and unfortunately, I just do not have this kind of time available right now.
That being said, I think it is still very abnormal that after more than 4000 CPU hours, the indexing is only at this stage with that kind of input. There is something else going on. Memory is clearly not the issue since you gave the job 500 GB, I wouldn't expect Ratatosk to use that much. Now that we know the job has 122 cores allocated, I am sorry to ask but are you 100% sure that Ratatosk runs with -c 120
as argument? Is there any way for you to ssh
to the machine, do a top
and see how much CPU % Ratatosk is using?
Something else you can do is check the output folder of Ratatosk (where the index will be written). There should be a file there named *.index.k63.fasta
or something like that. The timestamp on that file (ls -lh
) will tell you when it was last modified which is when Ratatosk wrote it to disk. Given that you know for how long the job has been running (1 day and about 11 hours at the time you wrote your last answer), you can make the difference between those two dates and know how long it took Ratatosk to get there. This would be helpful for me to know this information. This is obviously not optimal and I agree with you that a proper logging system with timestamps would be more helpful. I'll add this to my todo list.
Hello @GuillaumeHolley,
There is unusual situation in the Ratatosk execution.
Six days spend, and there are no results yet. The sacct
output is
$ sacct -j {JobId} --format=CPUTimeRAW,Elapsed,NCPUS,State,AveVMSize,ReqCPUs,ReqMem
CPUTimeRAW Elapsed NCPUS State AveVMSize ReqCPUS ReqMem
---------- ---------- ---------- ---------- ---------- -------- ----------
63384368 6-00:19:04 122 RUNNING 122 500Gn
I have been saved the tail of Ratatosk output logs every 10 minutes, and also monitored the filesizes in the output dir. Then I have noted that the processed reads count increases in jump-like manner but the filesize increases linearly. You can see it in the plot below. Plot I have no access to compute nodes by ssh. HPC administrator said the program did not seem to be using any resources.
Hi @asan-emirsaleh,
At the stage of the indexing where you seem to be stuck, what Ratatosk does is a basic mapping of the short reads to the graph. Keeping the mapping information in RAM can be too heavy so in order to not use too much RAM, every 4GB of input short read processed, Ratatosk writes the mapping information to disk, clear these from the RAM and process the next 4GB. Processing short reads is parallel but writing to disk is serial. This most likely explains why your plot of the processed reads has this cascade-shape curve (in orange): the parts of the curve that go up very suddenly are the parallel parts but the flat parts of the curve are the writing to disk which is serial. And as you can see, there are a lot of flat parts on the orange curve. Based on your plot, I estimate that writing to disk is about 100 time slower than processing the reads which is very abnormal. Now that we know this, combined with the fact that your HPC admin said Ratatosk seemed to not be using any resources, leads me to think that there is an IO wait issue here. Basically, Ratatosk spends most of its time waiting to write to disk. There are a few things that could explain this IO issue:
Guillaume
Hi @asan-emirsaleh,
Did you have a chance to look at the eventual IO wait issue?
Thanks, Guillaume
Hi @GuillaumeHolley Yes you are right. There was an IO issue in the compute cluster. After the storage subsystem has been repared by admin the Ratatosk processed well.
$ sacct -j {index 1} --format=CPUTimeRAW,Elapsed,NCPUS,State,AveVMSize,ReqCPUs,ReqMem,TresUsageInTot
CPUTimeRAW Elapsed NCPUS State AveVMSize ReqCPUS ReqMem TRESUsageInTot
---------- ---------- ---------- ---------- ---------- -------- ---------- --------------
11828022 1-02:55:51 122 COMPLETED 122 500Gn
11828022 1-02:55:51 122 COMPLETED 346493000K 122 500Gn cpu=63-08:04:+
$ sacct -j {correction 1} --format=CPUTimeRAW,Elapsed,NCPUS,State,AveVMSize,ReqCPUs,ReqMem,TresUsageInTot
CPUTimeRAW Elapsed NCPUS State AveVMSize ReqCPUS ReqMem TRESUsageInTot
---------- ---------- ---------- ---------- ---------- -------- ---------- --------------
704428 01:36:14 122 COMPLETED 122 440Gn
704428 01:36:14 122 COMPLETED 147831976K 122 440Gn cpu=5-17:13:3+
$ sacct -j {index 2} --format=CPUTimeRAW,Elapsed,NCPUS,State,AveVMSize,ReqCPUs,ReqMem,TresUsageInTot
CPUTimeRAW Elapsed NCPUS State AveVMSize ReqCPUS ReqMem TRESUsageInTot
---------- ---------- ---------- ---------- ---------- -------- ---------- --------------
521428 01:11:14 122 COMPLETED 122 500Gn
521428 01:11:14 122 COMPLETED 58529680K 122 500Gn cpu=2-08:56:5+
$ sacct -j {correction 2} --format=CPUTimeRAW,Elapsed,NCPUS,State,AveVMSize,ReqCPUs,ReqMem,TresUsageInTot
CPUTimeRAW Elapsed NCPUS State AveVMSize ReqCPUS ReqMem TRESUsageInTot
---------- ---------- ---------- ---------- ---------- -------- ---------- --------------
195810 00:26:45 122 COMPLETED 122 440Gn
195810 00:26:45 122 COMPLETED 47679736K 122 440Gn cpu=1-15:46:1+
$ ls -la
total 81388459
drwxr-xr-x 2 asan group 6 okt 20 01:51 .
drwxr-xr-x 6 asan group 43 okt 22 09:34 ..
-rw-r--r-- 1 asan group 5911261530 okt 19 19:39 out_long_reads.2.fastq
-rw-r--r-- 1 asan group 5906800800 okt 20 01:56 out_long_reads.fastq
-rw-r--r-- 1 asan group 5248526797 okt 19 03:02 out_long_reads.index.k31.fasta
-rw-r--r-- 1 asan group 50481293054 okt 19 03:13 out_long_reads.index.k31.rtsk
-rw-r--r-- 1 asan group 8166012081 okt 20 00:10 out_long_reads.index.k63.fasta
-rw-r--r-- 1 asan group 7627884759 okt 20 00:11 out_long_reads.index.k63.rtsk
I had a lot of RAM available but faced with disk IO issue. If I had an option to keep data in RAM until complete IO would not be delayed so much. May customizing of IO be a good idea for future improvement of the program. Thank you for quick response and a help in my issue!
Best regards Asan
Thank you for really amazing tool.
I have faced with some errors when I tried to use Ratatosk. Generally it was a time or memory limit errors. Execution have been stopped on the indexing stage 1, after a days of computation. As I saw at #35, such behavior is not a norm, and some errors may be incorporated in my pipeline. So I need a way to split indexing stage onto separate jobs for achieved intermediate files to be saved. If I simple rerun the indexing stage, it would not use computed files and would begin from the zero level. So in that case I would loss a days of computation.
So I allocated 240 Gb of RAM and 120 threads via workload manager. Execution spend 12 days until be stopped
It resulted in two files
I used 150 bp PE reads (fastq) and corrected by overlap consensus ONT reads in fasta format as input data:
The genome which reads I have tried to correct was approximately of 1,4 Gb
The command I used was:
The last strings reported was:
What you recommend to do?