Open dcopetti opened 3 years ago
Hello,
The job eventually completed the megareads step, and now it is looking like it is running the Celera Assembler. The stdout shows this:
ERROR: failed to merge alignments at position 3355
Please file a bug report
ERROR: failed to merge alignments at position 327
Please file a bug report
ERROR: failed to merge alignments at position 468
Please file a bug report
cat: merges.1.txt: No such file or directory
cat: merges.2.txt: No such file or directory
cat: merges.4.txt: No such file or directory
cat: merges.7.txt: No such file or directory
[...]
cat: merges.57.txt: No such file or directory
cat: merges.58.txt: No such file or directory
cat: merges.62.txt: No such file or directory
cat: merges.64.txt: No such file or directory
cat: merges.66.txt: No such file or directory
cat: merges.70.txt: No such file or directory
cat: merges.74.txt: No such file or directory
cat: merges.77.txt: No such file or directory
cat: merges.78.txt: No such file or directory
[Wed Oct 28 17:42:09 CET 2020] Warning! Some or all gap consensus jobs failed, see files in mr.41.15.17.0.029.join_consensus.tmp, proceeding anyway, to rerun gap consensus erase mr.41.15.17.0.029.1.fa and re-run assemble.sh
[Wed Oct 28 17:44:26 CET 2020] Generating assembly input files
[Wed Oct 28 22:30:19 CET 2020] Coverage threshold for splitting unitigs is 35 minimum ovl 250
[Wed Oct 28 22:30:19 CET 2020] Running assembly
I wonder if there is something I should worry about. During the megareads step there have been many lines like the ones on top.
Also, I noticed that both now and earlier (when running nucmer for the megareads I guess) the job was taking about double the number of CPUs I gave it (70-80 when giving 40): is that normal? The sys admins are not happy to see that high load (the machine has 48 cores).
Can it be now that a file called mr.41.15.17.0.029.1.fa
has the megareads? Given its size and header structure, I would guess so.
I was expecting to have Flye running now, is it coming later? The MaSuRCA version I am using is 3.3.9
Thanks,
Dario
I am getting the same kind of error with MaSuRCA v4.0.1, running it on a large eukaryotic genome (dataset is a combo of PacBio, Nanopore, and Illumina reads):
[Wed Feb 10 09:39:10 AEST 2021] Processing pe library reads
[Wed Feb 10 09:39:10 AEST 2021] Average PE read length 148
[Wed Feb 10 09:39:10 AEST 2021] Using kmer size of 99 for the graph
cat: write error: Broken pipe
[Wed Feb 10 09:39:10 AEST 2021] MIN_Q_CHAR: 33
[Wed Feb 10 09:39:10 AEST 2021] Estimated genome size: 1485293479
[Wed Feb 10 09:39:10 AEST 2021] Creating k-unitigs with k=99
[Wed Feb 10 15:49:52 AEST 2021] Computing super reads from PE
[Thu Feb 11 05:24:03 AEST 2021] Using CABOG from /gpfs1/homes/s4255161/MaSuRCA-4.0.1/bin/../CA8/L
inux-amd64/bin
[Thu Feb 11 05:24:03 AEST 2021] Running mega-reads correction/assembly
[Thu Feb 11 05:24:03 AEST 2021] Using mer size 17 for mapping, B=12, d=0.02
[Thu Feb 11 05:24:03 AEST 2021] Estimated Genome Size 1485293479
[Thu Feb 11 05:24:03 AEST 2021] Estimated Ploidy 1
[Thu Feb 11 05:24:03 AEST 2021] Using 24 threads
[Thu Feb 11 05:24:03 AEST 2021] Output prefix mr.41.17.12.0.02
gzip: stdout: Broken pipe
[Thu Feb 11 05:24:03 AEST 2021] Pre-correction and initial filtering of the long reads
[Thu Feb 11 20:49:47 AEST 2021] Reducing super-read k-mer size
[Thu Feb 11 22:00:41 AEST 2021] Computing mega-reads
[Thu Feb 11 22:00:41 AEST 2021] Running locally in 1 batch
[Mon Feb 15 03:09:59 AEST 2021] Refining alignments
ERROR: failed to merge alignments at position 482
Please file a bug report
[Mon Feb 15 04:24:38 AEST 2021] Computing allowed merges
[Mon Feb 15 04:27:58 AEST 2021] Joining
[Mon Feb 15 04:38:02 AEST 2021] Gap consensus
[Mon Feb 15 05:10:12 AEST 2021] Generating assembly input files
[Mon Feb 15 08:01:48 AEST 2021] Coverage threshold for splitting unitigs is 37 minimum ovl 499
[Mon Feb 15 08:01:48 AEST 2021] Running assembly
My job hasn't completed yet, but just like @dcopetti it moved on from the error and started running the assembly. Should we be concerned?
Hello, I am running MaSuRCA on a plant genome, I have PacBio and Illumina data as input. I just noticed this notification:
I wonder if it is something I should worry about at the moment, I would like to make sure that at least the mega reads are produced correctly. Thanks, Dario