Closed mmokrejs closed 6 years ago
cat tt_16D1C3L12-3.fa tt_16D1C3L12-4.fa \
|PathConsensus -v --dot -k128 -p0.9 -o tt_16D1C3L12-5.path -s tt_16D1C3L12-5.fa -g tt_16D1C3L12-5.dot - tt_16D1C3L12-4.dot tt_16D1C3L12-4.path3
Reading `tt_16D1C3L12-4.dot'...
Reading `-'...
Reading `tt_16D1C3L12-4.path3'...
Read 152688 paths
Ambiguous paths: 47661
Merged: 5735
No paths: 0
Too many paths: 11353
Too complex: 5974
Dissimilar: 24599
The minimum coverage of single-end contigs is 0.855932.
The minimum coverage of merged contigs is 0.855932.
n n:200 L50 min N80 N50 N20 E-size max sum name
3011140 1623221 81178 200 333 3934 11026 6464 104425 1.38e9 tt_16D1C3L12-6.fa
ln -sf tt_16D1C3L12-6.fa tt_16D1C3L12-contigs.fa
PathConsensus -v --dot -k128 -p0.9 -s tt_16D1C3L12-7.fa -g tt_16D1C3L12-7.dot -o tt_16D1C3L12-7.path tt_16D1C3L12-6.fa tt_16D1C3L12-6.dot tt_16D1C3L12-6.path
Reading `tt_16D1C3L12-6.dot'...
Reading `tt_16D1C3L12-6.fa'...
Reading `tt_16D1C3L12-6.path'...
Read 21315 paths
Ambiguous paths: 111852
Merged: 6930
No paths: 85606
Too many paths: 1448
Too complex: 15805
Dissimilar: 2063
The minimum coverage of single-end contigs is 1.19492.
The minimum coverage of merged contigs is 3.23423.
Consider increasing the coverage threshold parameter, c, to 3.23423.
n n:200 L50 min N80 N50 N20 E-size max sum name
2876731 1495511 9250 200 333 22846 127840 69342 793681 1.375e9 tt_16D1C3L12-8.fa
ln -sf tt_16D1C3L12-8.fa tt_16D1C3L12-scaffolds.fa
PathOverlap --overlap -v -k128 --dot tt_16D1C3L12-7.dot tt_16D1C3L12-7.path >tt_16D1C3L12-8.dot
Reading `tt_16D1C3L12-7.dot'...
Reading `tt_16D1C3L12-7.path'...
ln -sf tt_16D1C3L12-8.dot tt_16D1C3L12-scaffolds.dot
abyss-fac tt_16D1C3L12-unitigs.fa tt_16D1C3L12-contigs.fa tt_16D1C3L12-scaffolds.fa |tee tt_16D1C3L12-stats.tab
n n:500 L50 min N80 N50 N20 E-size max sum name
3835048 475698 83169 500 1260 3192 6791 4462 56412 956e6 tt_16D1C3L12-unitigs.fa
3011140 310186 43902 500 2643 6415 13450 8757 104425 1.006e9 tt_16D1C3L12-contigs.fa
2876731 189115 4029 500 9361 56647 164847 94949 793681 1.003e9 tt_16D1C3L12-scaffolds.fa
ln -sf tt_16D1C3L12-stats.tab tt_16D1C3L12-stats
tr '\t' , <tt_16D1C3L12-stats.tab >tt_16D1C3L12-stats.csv
abyss-tabtomd tt_16D1C3L12-stats.tab >tt_16D1C3L12-stats.md
Here is some statistics from various runs of BBMap_36.86/stats.sh in=tt_16D1C3L12-?.fa
on the FASTA files:
$ for f in tt_16D1C3L12-?.stats; do echo $f; cat $f; done
tt_16D1C3L12-3.stats
A C G T N IUPAC Other GC GC_stdev
0.3002 0.2001 0.2002 0.2995 0.0000 0.0000 0.0000 0.4003 0.0996
Main genome scaffold total: 3835048
Main genome contig total: 3835048
Main genome scaffold sequence total: 1651.869 MB
Main genome contig sequence total: 1651.869 MB 0.000% gap
Main genome scaffold N/L50: 280737/938
Main genome contig N/L50: 280737/938
Main genome scaffold N/L90: 2578210/142
Main genome contig N/L90: 2578210/142
Max scaffold length: 56.412 KB
Max contig length: 56.412 KB
Number of scaffolds > 50 KB: 1
% main genome in scaffolds > 50 KB: 0.00%
Minimum Number Number Total Total Scaffold
Scaffold of of Scaffold Contig Contig
Length Scaffolds Contigs Length Length Coverage
-------- -------------- -------------- -------------- -------------- --------
All 3,835,048 3,835,048 1,651,869,106 1,651,869,106 100.00%
100 3,835,048 3,835,048 1,651,869,106 1,651,869,106 100.00%
250 1,733,198 1,733,198 1,333,132,564 1,333,132,564 100.00%
500 475,698 475,698 956,024,894 956,024,894 100.00%
1 KB 267,415 267,415 813,134,944 813,134,944 100.00%
2.5 KB 116,134 116,134 571,167,119 571,167,119 100.00%
5 KB 38,065 38,065 299,800,293 299,800,293 100.00%
10 KB 6,544 6,544 88,583,607 88,583,607 100.00%
25 KB 125 125 3,756,838 3,756,838 100.00%
50 KB 1 1 56,412 56,412 100.00%
tt_16D1C3L12-6.stats
A C G T N IUPAC Other GC GC_stdev
0.3001 0.2001 0.2002 0.2995 0.0018 0.0001 0.0000 0.4004 0.1019
Main genome scaffold total: 3011140
Main genome contig total: 3045311
Main genome scaffold sequence total: 1580.472 MB
Main genome contig sequence total: 1577.608 MB 0.181% gap
Main genome scaffold N/L50: 110528/2.838 KB
Main genome contig N/L50: 120623/2.592 KB
Main genome scaffold N/L90: 1850453/161
Main genome contig N/L90: 1891146/160
Max scaffold length: 104.458 KB
Max contig length: 66.423 KB
Number of scaffolds > 50 KB: 53
% main genome in scaffolds > 50 KB: 0.19%
Minimum Number Number Total Total Scaffold
Scaffold of of Scaffold Contig Contig
Length Scaffolds Contigs Length Length Coverage
-------- -------------- -------------- -------------- -------------- --------
All 3,011,140 3,045,311 1,580,471,679 1,577,607,982 99.82%
100 3,011,140 3,045,311 1,580,471,679 1,577,607,982 99.82%
250 1,395,308 1,429,479 1,330,002,425 1,327,138,728 99.78%
500 310,246 344,286 1,008,724,822 1,005,877,736 99.72%
1 KB 172,172 206,152 915,193,035 912,365,691 99.69%
2.5 KB 122,397 150,934 821,859,910 819,446,210 99.71%
5 KB 61,833 80,024 605,874,187 604,315,054 99.74%
10 KB 20,351 28,293 316,990,308 316,304,244 99.78%
25 KB 1,545 2,473 49,196,477 49,116,291 99.84%
50 KB 53 101 3,074,739 3,071,242 99.89%
100 KB 1 2 104,458 104,425 99.97%
tt_16D1C3L12-7.stats
A C G T N IUPAC Other GC GC_stdev
0.3107 0.1897 0.1892 0.3104 0.0012 0.0036 0.0000 0.3789 0.0719
Main genome scaffold total: 772
Main genome contig total: 777
Main genome scaffold sequence total: 0.607 MB
Main genome contig sequence total: 0.606 MB 0.119% gap
Main genome scaffold N/L50: 187/1.089 KB
Main genome contig N/L50: 187/1.089 KB
Main genome scaffold N/L90: 576/371
Main genome contig N/L90: 582/370
Max scaffold length: 3.685 KB
Max contig length: 3.685 KB
Number of scaffolds > 50 KB: 0
% main genome in scaffolds > 50 KB: 0.00%
Minimum Number Number Total Total Scaffold
Scaffold of of Scaffold Contig Contig
Length Scaffolds Contigs Length Length Coverage
-------- -------------- -------------- -------------- -------------- --------
All 772 777 607,189 606,469 99.88%
100 772 777 607,189 606,469 99.88%
250 761 766 604,828 604,108 99.88%
500 430 435 485,414 484,694 99.85%
1 KB 205 208 324,445 323,991 99.86%
2.5 KB 14 15 40,391 40,247 99.64%
tt_16D1C3L12-8.stats
A C G T N IUPAC Other GC GC_stdev
0.3001 0.2002 0.2003 0.2994 0.0776 0.0001 0.0000 0.4005 0.1032
Main genome scaffold total: 2876731
Main genome contig total: 3015816
Main genome scaffold sequence total: 1703.964 MB
Main genome contig sequence total: 1571.653 MB 7.765% gap
Main genome scaffold N/L50: 11981/19.507 KB
Main genome contig N/L50: 108612/2.736 KB
Main genome scaffold N/L90: 1639503/171
Main genome contig N/L90: 1867631/160
Max scaffold length: 873.806 KB
Max contig length: 90.766 KB
Number of scaffolds > 50 KB: 5362
% main genome in scaffolds > 50 KB: 37.88%
Minimum Number Number Total Total Scaffold
Scaffold of of Scaffold Contig Contig
Length Scaffolds Contigs Length Length Coverage
-------- -------------- -------------- -------------- -------------- --------
All 2,876,731 3,015,816 1,703,964,196 1,571,652,579 92.24%
100 2,876,731 3,015,816 1,703,964,196 1,571,652,579 92.24%
250 1,268,766 1,407,851 1,454,714,840 1,322,403,223 90.90%
500 189,163 328,132 1,135,210,242 1,002,913,588 88.35%
1 KB 52,595 191,516 1,042,703,436 910,423,098 87.31%
2.5 KB 32,095 169,028 1,006,465,234 874,336,179 86.87%
5 KB 23,821 159,541 978,974,988 846,946,092 86.51%
10 KB 17,880 148,430 935,974,349 809,391,570 86.48%
25 KB 9,965 122,063 807,426,899 701,763,980 86.91%
50 KB 5,362 93,327 645,398,771 565,717,262 87.65%
100 KB 2,428 61,150 440,176,973 389,086,844 88.39%
250 KB 382 17,275 130,980,817 117,113,064 89.41%
500 KB 28 2,206 17,115,130 15,343,274 89.65%
$
My real concern now is the precision of mate-pair mapping but that is another story.
You can try aligner=bwamem
if you prefer. The default is using abyss-map
. If you do, I'd be curious to see the results of abyss-fac
on both assemblies. Note that you don't have to rerun the MPI stage. You can resume the assembly from after the unitigs stage. Let us know if you'd like help with that.
So this maybe reveals you could still do better during the initial steps of parsing input FASTQ files and splitting them into k-mers and counting them.
Do you know what proportion of time is spent loading and counting k-mers vs the total run time of ABYSS-P
?
I started same MPI-enabled job but with k=64 few hrs ago. The logfile at the moment keeps saying something like:
0: Read 84700000 reads. 0: Hash load: 57122315 / 268435456 = 0.213 using 2.71 GB
1: Read 84600000 reads. 1: Hash load: 56263364 / 268435456 = 0.21 using 2.67 GB
0: Read 84800000 reads. 0: Hash load: 57187818 / 268435456 = 0.213 using 2.71 GB
1: Read 84700000 reads. 1: Hash load: 56327180 / 268435456 = 0.21 using 2.67 GB
These two "0 and 1 workers" are because there are two paired-end dataset files, I remeber the logfile of the k=128 attempt mentioned them later. The RAM values are probably per single thread, so supposed to be multiplied by 104 in my case? See actually used RAM in figures below.
Here are runtime graphs since the job started 3.5hrs ago:
Please add timing information to your logs. Either current time or time since the job started, which would be maybe preferred (SPAdes also prints time since the job started). also, I would be happy if you prefixed each logged line with "Info: " or "Debug: " prefix. It would be much easier to grep
through the file. It is tough to shrink down the number of lines once I enabled the verbose logging. Even better if you included on the line names of the methods/tools in action, like:
Info: abyss-map: ...
Info: DistanceEst: ...
To answer your question, I think this hash-filling phase finished when the memory usage was at maximum so it proceeded since 2/5/2018 19:33 to 2/6/2018 12:00 when the disk read IO went to zero (see the LustreFS chart in the original post).
To answer your question, I think this hash-filling phase finished when the memory usage was at maximum so it proceeded since 2/5/2018 19:33 to 2/6/2018 12:00 when the disk read IO went to zero (see the LustreFS chart in the original post).
So how long was sequence loading (from start until peak memory) and how long was the total run time of ABYSS-P
?
So how long was sequence loading (from start until peak memory) and how long was the total run time of ABYSS-P?
987 minutes vs. 1741 minutes, wallclock.
So loading is 57% of run time, which is a significant portion. Thanks for this helpful info.
@mmokrejs Thank you for taking the time to post your performance benchmarks. It is very interesting for us.
You are welcome. Would you please comment how to interpret the numbers in https://github.com/bcgsc/abyss/issues/187#issuecomment-363741128 from the PathConsensus
step? I mean, what does it tell me? How should I change k-mer size, or should bother with more thorough cleanup of the mate-pair reads from paired-end contaminants? Does it say something about the complexity of the genome, number of alleles? Or does it reveal number of somewhat error-free contigs/unitigs while the future scaffolding works will be just a gamble trying to order these golden pieces into any series? I am probably not going to ever open the .dot or .path files but is there a summary message somewhere teaching me something? I am sorry for the naive questions.
To make inferences about the repeat complexity and heterozygosity of your genome, I recommend using ntCard and GenomeScope.
To optimize ABySS, I recommend trying different values of k and N, which make the biggest difference to the contiguity of the assembly. N only affects the very final stage of scaffolding, and so can be optimized quite quickly if you avoid rerunning the entire pipeline.
To optimize ABySS, I recommend trying different values of k and N, which make the biggest difference to the contiguity of the assembly. N only affects the very final stage of scaffolding, and so can be optimized quite quickly if you avoid rerunning the entire pipeline.
Would you be more explicit what command do you mean or simply how to achieve what you say? I figured out you speak about abyss-pe -N $int ... scaffolds
but should I delete the ${name}-scaffolds.fa or will it just create them with higher number?
The command is abyss-pe N=10 …
Grep the abyss-scaffold
command out of your log file. Run it with different values of -n
. The last line it reports is NG50. Select your favourite value of -n
, say the one that maximizes NG50. Rerun abyss-scaffold
with your favourite value of -n
. Run abyss-pe --dry-run N=xxx …
with your favourite value of N
to confirm that it will only rerun the last few commands of the pipeline, then rerun abyss-pe N=xxx …
to run those commands.
Note that abyss-scaffold
takes the option -n10
whereas abyss-pe
takes the option N=10
You could of course just run the entire pipeline abyss-pe N=10 …
with different values of N
, but the above, although more complicated, is much faster.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
$ grep abyss-scaffold abyss_ecc.101473.log
abyss-scaffold -v -k128 -s1000-10000 -n10 ...
so I ran
abyss-scaffold -v -k128 -s100-10000 -n 3 -G 1267403131 ...
Reading `tt_16D1C3L12__abyss_128-6.dot'...
V=5049442 E=8506670 E/V=1.68
Degree: ▃█▅_
01234
0: 18% 1: 42% 2-4: 37% 5+: 2.9% max: 910
Reading `HFYJ5AFXX.5kb.lmp-6.dist.dot'...
V=5049442 E=8762331 E/V=1.74
Degree: ▂█▅▁
01234
0: 16% 1: 42% 2-4: 39% 5+: 3% max: 910
Reading `HFYJ5AFXX.8kb.lmp-6.dist.dot'...
V=5049442 E=8878485 E/V=1.76
Degree: ▂█▅▁
01234
0: 16% 1: 42% 2-4: 39% 5+: 3% max: 910
Reading `HWFNLBCXY.lmp-6.dist.dot'...
...
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2520923 2520923 105820 75262 3924 128 256 2874 9085 7139 157470 1.473e9 s=10000
Removed 4721356 vertices.
Removed 313393 edges.
V=328086 E=364252 E/V=1.11
Degree: ▂█▃
01234
0: 17% 1: 57% 2-4: 26% 5+: 0.024% max: 27
Removed 264 cyclic edges.
V=328086 E=363988 E/V=1.11
Degree: ▂█▃
01234
0: 18% 1: 57% 2-4: 26% 5+: 0.023% max: 27
Added 1432 edges to ambiguous vertices.
Removed 7372 tips.
V=313342 E=352108 E/V=1.12
Degree: ▁█▃
01234
0: 16% 1: 57% 2-4: 26% 5+: 0.021% max: 25
Cleared 2245 ambiguous vertices.
Removed 131 ambiguous vertices.
V=313080 E=340742 E/V=1.09
Degree: ▂█▂
01234
0: 17% 1: 58% 2-4: 25% 5+: 0.0042% max: 21
Removed 75120 transitive edges.
V=313080 E=265622 E/V=0.848
Degree: ▁█
01234
0: 17% 1: 80% 2-4: 2.1% 5+: 0.0026% max: 20
Removed 3531 tips.
V=306018 E=258560 E/V=0.845
Degree: ▁█
01234
0: 17% 1: 82% 2-4: 1.4% 5+: 0.002% max: 20
Removed 7081 vertices in bubbles.
V=298934 E=248126 E/V=0.83
Degree: ▁█
01234
0: 17% 1: 82% 2-4: 0.28% 5+: 0.002% max: 20
Removed 138 weak edges.
V=298934 E=247988 E/V=0.83
Degree: ▁█
01234
0: 17% 1: 82% 2-4: 0.23% 5+: 0.002% max: 20
Assembled 144940 contigs in 22216 scaffolds.
V=298934 E=247988 E/V=0.83
Degree: ▁█
01234
0: 17% 1: 82% 2-4: 0.23% 5+: 0.002% max: 20
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
Best scaffold N50 is 11461 at s=1000.
So what the above really says to me? For example 0: 18% 1: 42% 2-4: 37% 5+: 2.9% max: 910
means that 42% of links exist due to only 2-4 mate-pairs?
Aha, you probably wanted me to show the non-verbose output.
$ abyss-scaffold -k128 -s100-10000 -n 2 -G 1267403131 -g tt_16D1C3L12__abyss_128-6.path.dot tt_16D1C3L12__abyss_128-6.dot HFYJ5AFXX.5kb.lmp-6.dist.dot HFYJ5AFXX.8kb.lmp-6.dist.dot HWFNLBCXY.lmp-6.dist.dot HFYJ5AFXX.5kb.unknown-6.dist.dot HFYJ5AFXX.8kb.unknown-6.dist.dot HWFNLBCXY.2.unknown-6.dist.dot HFYJ5AFXX.5kb.fragments-6.dist.dot HFYJ5AFXX.8kb.fragments-6.dist.dot HWFNLBCXY.2.fragments-6.dist.dot >tt_16D1C3L12__abyss_128-6.path
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2396903 2396903 15550 9207 22270 128 256 10926 99549 53585 905173 1.467e9 s=100
2396959 2396959 15547 9207 22273 128 256 10931 99549 53568 905173 1.467e9 s=200
2397391 2397391 15487 9181 22373 128 256 10992 99500 53778 905146 1.467e9 s=500
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
2416139 2416139 17217 10037 20187 128 256 9310 91419 49894 722459 1.468e9 s=2000
2495803 2495803 81007 50578 3906 128 256 2865 24673 15378 258793 1.471e9 s=5000
2520923 2520923 105820 75262 3924 128 256 2874 9085 7139 157470 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
Best scaffold N50 is 11461 at s=1000.
$ abyss-scaffold -k128 -s100-10000 -n 3 -G 1267403131 -g tt_16D1C3L12__abyss_128-6.path.dot tt_16D1C3L12__abyss_128-6.dot HFYJ5AFXX.5kb.lmp-6.dist.dot HFYJ5AFXX.8kb.lmp-6.dist.dot HWFNLBCXY.lmp-6.dist.dot HFYJ5AFXX.5kb.unknown-6.dist.dot HFYJ5AFXX.8kb.unknown-6.dist.dot HWFNLBCXY.2.unknown-6.dist.dot HFYJ5AFXX.5kb.fragments-6.dist.dot HFYJ5AFXX.8kb.fragments-6.dist.dot HWFNLBCXY.2.fragments-6.dist.dot >tt_16D1C3L12__abyss_128-6.path
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2396903 2396903 15550 9207 22270 128 256 10926 99549 53585 905173 1.467e9 s=100
2396959 2396959 15547 9207 22273 128 256 10931 99549 53568 905173 1.467e9 s=200
2397391 2397391 15487 9181 22373 128 256 10992 99500 53778 905146 1.467e9 s=500
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
2416139 2416139 17217 10037 20187 128 256 9310 91419 49894 722459 1.468e9 s=2000
2495803 2495803 81007 50578 3906 128 256 2865 24673 15378 258793 1.471e9 s=5000
2520923 2520923 105820 75262 3924 128 256 2874 9085 7139 157470 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
Best scaffold N50 is 11461 at s=1000.
$ abyss-scaffold -k128 -s100-10000 -n 5 -G 1267403131 -g tt_16D1C3L12__abyss_128-6.path.dot tt_16D1C3L12__abyss_128-6.dot HFYJ5AFXX.5kb.lmp-6.dist.dot HFYJ5AFXX.8kb.lmp-6.dist.dot HWFNLBCXY.lmp-6.dist.dot HFYJ5AFXX.5kb.unknown-6.dist.dot HFYJ5AFXX.8kb.unknown-6.dist.dot HWFNLBCXY.2.unknown-6.dist.dot HFYJ5AFXX.5kb.fragments-6.dist.dot HFYJ5AFXX.8kb.fragments-6.dist.dot HWFNLBCXY.2.fragments-6.dist.dot >tt_16D1C3L12__abyss_128-6.path
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2396903 2396903 15550 9207 22270 128 256 10926 99549 53585 905173 1.467e9 s=100
2396959 2396959 15547 9207 22273 128 256 10931 99549 53568 905173 1.467e9 s=200
2397391 2397391 15487 9181 22373 128 256 10992 99500 53778 905146 1.467e9 s=500
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
2416139 2416139 17217 10037 20187 128 256 9310 91419 49894 722459 1.468e9 s=2000
2495803 2495803 81007 50578 3906 128 256 2865 24673 15378 258793 1.471e9 s=5000
2520923 2520923 105820 75262 3924 128 256 2874 9085 7139 157470 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
Best scaffold N50 is 11461 at s=1000.
$ abyss-scaffold -k128 -s100-10000 -n 7 -G 1267403131 -g tt_16D1C3L12__abyss_128-6.path.dot tt_16D1C3L12__abyss_128-6.dot HFYJ5AFXX.5kb.lmp-6.dist.dot HFYJ5AFXX.8kb.lmp-6.dist.dot HWFNLBCXY.lmp-6.dist.dot HFYJ5AFXX.5kb.unknown-6.dist.dot HFYJ5AFXX.8kb.unknown-6.dist.dot HWFNLBCXY.2.unknown-6.dist.dot HFYJ5AFXX.5kb.fragments-6.dist.dot HFYJ5AFXX.8kb.fragments-6.dist.dot HWFNLBCXY.2.fragments-6.dist.dot >tt_16D1C3L12__abyss_128-6.path
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2396903 2396903 15550 9207 22270 128 256 10926 99549 53585 905173 1.467e9 s=100
2396959 2396959 15547 9207 22273 128 256 10931 99549 53568 905173 1.467e9 s=200
2397391 2397391 15487 9181 22373 128 256 10992 99500 53778 905146 1.467e9 s=500
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
2416139 2416139 17217 10037 20187 128 256 9310 91419 49894 722459 1.468e9 s=2000
2495803 2495803 81007 50578 3906 128 256 2865 24673 15378 258793 1.471e9 s=5000
2520923 2520923 105820 75262 3924 128 256 2874 9085 7139 157470 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
Best scaffold N50 is 11461 at s=1000.
$ abyss-scaffold -k128 -s100-10000 -n 10 -G 1267403131 -g tt_16D1C3L12__abyss_128-6.path.dot tt_16D1C3L12__abyss_128-6.dot HFYJ5AFXX.5kb.lmp-6.dist.dot HFYJ5AFXX.8kb.lmp-6.dist.dot HWFNLBCXY.lmp-6.dist.dot HFYJ5AFXX.5kb.unknown-6.dist.dot HFYJ5AFXX.8kb.unknown-6.dist.dot HWFNLBCXY.2.unknown-6.dist.dot HFYJ5AFXX.5kb.fragments-6.dist.dot HFYJ5AFXX.8kb.fragments-6.dist.dot HWFNLBCXY.2.fragments-6.dist.dot >tt_16D1C3L12__abyss_128-6.path
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2396903 2396903 15550 9207 22270 128 256 10926 99549 53585 905173 1.467e9 s=100
2396959 2396959 15547 9207 22273 128 256 10931 99549 53568 905173 1.467e9 s=200
2397391 2397391 15487 9181 22373 128 256 10992 99500 53778 905146 1.467e9 s=500
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
2416139 2416139 17217 10037 20187 128 256 9310 91419 49894 722459 1.468e9 s=2000
2495803 2495803 81007 50578 3906 128 256 2865 24673 15378 258793 1.471e9 s=5000
2520923 2520923 105820 75262 3924 128 256 2874 9085 7139 157470 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2401997 2401997 15073 8978 23050 128 256 11461 100807 55075 904172 1.467e9 s=1000
Best scaffold N50 is 11461 at s=1000.
I do not see a difference based on the -n 10
parameter which has been used by default (according to the log file):
$ abyss-scaffold --help
Usage: abyss-scaffold -k<kmer> [OPTION]... FASTA|OVERLAP DIST...
Scaffold contigs using the distance estimate graph.
Arguments:
FASTA contigs in FASTA format
OVERLAP the contig overlap graph
DIST estimates of the distance between contigs
Options:
-n, --npairs=N minimum number of pairs [0]
-s, --seed-length=N minimum contig length [200]
or -s N0-N1 Find the value of s in [N0,N1]
that maximizes the scaffold N50.
I am close to say it makes no sense to continue with the abyss-pe
reusing the same value 10
but I realized from the log file it was ran without N=...
altogether. But is the default valu 10
also inside abyss-pe
?
I backed up previous scaffolds
$ myprefix="tt_16D1C3L12__abyss_128"; for p in ${myprefix}-7.fa ${myprefix}-7.dot ${myprefix}-7.path ${myprefix}-8.fa ${myprefix}-stats.tab ${myprefix}-stats.csv ${myprefix}-stats.md; do cp -p $f ${f}.ori; done
but I will better wait for your answer first. Thank you.
So what the above really says to me? For example 0: 18% 1: 42% 2-4: 37% 5+: 2.9% max: 910 means that 42% of links exist due to only 2-4 mate-pairs?
No, that's a histogram of vertex degree, that is, the number of edges incident to each vertex.
But is the default valu 10 also inside abyss-pe?
Yes, the default is N=10.
Interesting that for this data set all values of n between 2 and 10 seem to perform equally well. I'd suggest trying larger values of n
. Increase n
until the result changes. If it gets better, great! If it gets worse, stick with N=10.
Yes, the values are just worse.
$ abyss-scaffold -k128 -s100-10000 -n 12 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2401155 2401155 16729 10026 20831 128 256 10424 89879 48904 589188 1.467e9 s=100
2401191 2401191 16731 10027 20831 128 256 10424 89879 48889 589188 1.467e9 s=200
2401401 2401401 16707 10016 20872 128 256 10448 89985 49053 590846 1.467e9 s=500
2405031 2405031 16507 9914 21198 128 256 10655 90308 49316 589963 1.467e9 s=1000
2418661 2418661 18767 11032 18634 128 256 8721 82174 45305 606894 1.468e9 s=2000
2496590 2496590 81794 51342 3907 128 256 2865 23767 14777 258793 1.471e9 s=5000
2521062 2521062 105959 75373 3925 128 256 2874 9085 7047 157470 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2405031 2405031 16507 9914 21198 128 256 10655 90308 49316 589963 1.467e9 s=1000
Best scaffold N50 is 10655 at s=1000.
$ abyss-scaffold -k128 -s100-10000 -n 15 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2406177 2406177 18502 11187 19034 128 256 9619 80263 44058 586841 1.467e9 s=100
2406201 2406201 18502 11187 19034 128 256 9621 80263 44059 586841 1.467e9 s=200
2406357 2406357 18496 11187 19049 128 256 9630 80124 44104 586841 1.467e9 s=500
2409057 2409057 18449 11165 19094 128 256 9677 80092 44039 590638 1.467e9 s=1000
2422104 2422104 20912 12376 16884 128 256 7976 73112 40264 588866 1.468e9 s=2000
2497589 2497589 82753 52311 3908 128 256 2866 22557 14084 258793 1.472e9 s=5000
2521197 2521197 106094 75508 3925 128 256 2874 9085 6961 157470 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2409057 2409057 18449 11165 19094 128 256 9677 80092 44039 590638 1.467e9 s=1000
Best scaffold N50 is 9677 at s=1000.
$ abyss-scaffold -k128 -s100-10000 -n 17 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2408657 2408657 19586 11905 18045 128 256 9230 75095 41104 543238 1.467e9 s=100
2408685 2408685 19585 11905 18045 128 256 9232 75095 41104 543238 1.467e9 s=200
2408812 2408812 19585 11906 18047 128 256 9232 75065 41149 542938 1.467e9 s=500
2411200 2411200 19566 11904 18100 128 256 9241 74783 41012 541994 1.467e9 s=1000
2424032 2424032 22157 13175 15972 128 256 7590 68550 37632 540451 1.468e9 s=2000
2498128 2498128 83292 52850 3908 128 256 2866 22022 13703 258793 1.472e9 s=5000
2521269 2521269 106166 75580 3925 128 256 2874 9085 6924 157470 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2411200 2411200 19566 11904 18100 128 256 9241 74783 41012 541994 1.467e9 s=1000
Best scaffold N50 is 9241 at s=1000.
$ abyss-scaffold -k128 -s100-10000 -n 20 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2411397 2411397 20979 12799 16854 128 256 8720 69078 38113 517862 1.467e9 s=100
2411430 2411430 20976 12799 16854 128 256 8722 69078 38113 517862 1.467e9 s=200
2411537 2411537 20977 12798 16855 128 256 8722 69046 38155 517862 1.467e9 s=500
2413830 2413830 20979 12805 16899 128 256 8713 68998 38038 523230 1.468e9 s=1000
2426391 2426391 23717 14144 14929 128 256 7129 63386 34974 521458 1.468e9 s=2000
2498885 2498885 84049 53607 3908 128 256 2866 21209 13237 258793 1.472e9 s=5000
2521373 2521373 106270 75684 3925 128 256 2874 9085 6869 146598 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2411430 2411430 20976 12799 16854 128 256 8722 69078 38113 517862 1.467e9 s=200
Best scaffold N50 is 8722 at s=200.
$ abyss-scaffold -k128 -s100-10000 -n 23 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2414407 2414407 22313 13668 15955 128 256 8278 64078 35839 524054 1.468e9 s=100
2414436 2414436 22313 13668 15955 128 256 8277 64078 35839 524054 1.468e9 s=200
2414531 2414531 22315 13671 15961 128 256 8277 64048 35833 524054 1.468e9 s=500
2416422 2416422 22344 13686 15934 128 256 8243 63914 35690 523230 1.468e9 s=1000
2428552 2428552 25168 15070 14101 128 256 6773 59272 32728 521458 1.468e9 s=2000
2499642 2499642 84806 54364 3908 128 256 2866 20492 12782 258793 1.472e9 s=5000
2521459 2521459 106356 75770 3925 128 256 2874 9085 6820 146598 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2414407 2414407 22313 13668 15955 128 256 8278 64078 35839 524054 1.468e9 s=100
Best scaffold N50 is 8278 at s=100.
$ abyss-scaffold -k128 -s100-10000 -n 25 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2416621 2416621 23270 14279 15369 128 256 7977 61560 34179 489346 1.468e9 s=100
2416641 2416641 23270 14279 15369 128 256 7977 61560 34179 489346 1.468e9 s=200
2416716 2416716 23274 14280 15369 128 256 7976 61553 34177 489346 1.468e9 s=500
2418320 2418320 23313 14305 15342 128 256 7949 61394 34035 489346 1.468e9 s=1000
2430063 2430063 26159 15701 13599 128 256 6568 56777 31347 484560 1.468e9 s=2000
2500101 2500101 85265 54823 3908 128 256 2866 20117 12512 258793 1.472e9 s=5000
2521507 2521507 106404 75818 3925 128 256 2874 9085 6792 146598 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2416621 2416621 23270 14279 15369 128 256 7977 61560 34179 489346 1.468e9 s=100
Best scaffold N50 is 7977 at s=100.
$ abyss-scaffold -k128 -s100-10000 -n 28 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2420026 2420026 24634 15126 14467 128 256 7540 57644 32379 489346 1.468e9 s=100
2420042 2420042 24634 15124 14468 128 256 7540 57644 32379 489346 1.468e9 s=200
2420101 2420101 24636 15127 14467 128 256 7540 57644 32380 489346 1.468e9 s=500
2421285 2421285 24671 15147 14446 128 256 7524 57459 32270 489346 1.468e9 s=1000
2432260 2432260 27527 16550 12951 128 256 6276 53538 29739 484560 1.468e9 s=2000
2500728 2500728 85892 55418 3909 128 256 2866 19568 12223 258793 1.472e9 s=5000
2521566 2521566 106463 75877 3925 128 256 2874 9085 6755 146598 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2420026 2420026 24634 15126 14467 128 256 7540 57644 32379 489346 1.468e9 s=100
Best scaffold N50 is 7540 at s=100.
$ abyss-scaffold -k128 -s100-10000 -n 30 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2422212 2422212 25556 15712 13998 128 256 7245 55383 31216 438641 1.468e9 s=100
2422227 2422227 25556 15712 13998 128 256 7245 55383 31216 438641 1.468e9 s=200
2422278 2422278 25558 15713 13998 128 256 7244 55372 31216 438641 1.468e9 s=500
2423258 2423258 25591 15731 13982 128 256 7226 55303 31148 438641 1.468e9 s=1000
2433673 2433673 28412 17114 12591 128 256 6089 51685 28736 388962 1.468e9 s=2000
2501115 2501115 86279 55805 3909 128 256 2866 19255 12008 258793 1.472e9 s=5000
2521610 2521610 106507 75921 3925 128 256 2874 9085 6730 146598 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2422212 2422212 25556 15712 13998 128 256 7245 55383 31216 438641 1.468e9 s=100
Best scaffold N50 is 7245 at s=100.
$ abyss-scaffold -k128 -s100-10000 -n 33 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2425580 2425580 27035 16622 13246 128 256 6801 52194 29646 438641 1.468e9 s=100
2425594 2425594 27035 16622 13246 128 256 6801 52194 29646 438641 1.468e9 s=200
2425630 2425630 27036 16622 13247 128 256 6802 52194 29645 438641 1.468e9 s=500
2426410 2426410 27075 16639 13235 128 256 6776 52084 29591 438641 1.468e9 s=1000
2435904 2435904 29822 17988 12016 128 256 5803 48650 27440 388962 1.468e9 s=2000
2501703 2501703 86832 56373 3910 128 256 2867 18759 11695 258793 1.472e9 s=5000
2521681 2521681 106578 75992 3925 128 256 2874 9085 6692 146598 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2425630 2425630 27036 16622 13247 128 256 6802 52194 29645 438641 1.468e9 s=500
Best scaffold N50 is 6802 at s=500.
$ abyss-scaffold -k128 -s100-10000 -n 37 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2429647 2429647 28902 17756 12459 128 256 6337 48756 27712 438641 1.468e9 s=100
2429660 2429660 28902 17756 12459 128 256 6337 48756 27712 438641 1.468e9 s=200
2429692 2429692 28904 17757 12460 128 256 6337 48756 27710 438641 1.468e9 s=500
2430252 2430252 28931 17770 12444 128 256 6329 48739 27668 438641 1.468e9 s=1000
2438755 2438755 31606 19064 11393 128 256 5380 45863 25907 388962 1.469e9 s=2000
2502349 2502349 87478 57019 3910 128 256 2867 18213 11346 258793 1.472e9 s=5000
2521775 2521775 106672 76086 3925 128 256 2874 9086 6632 146598 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2429647 2429647 28902 17756 12459 128 256 6337 48756 27712 438641 1.468e9 s=100
Best scaffold N50 is 6337 at s=100.
$ abyss-scaffold -k128 -s100-10000 -n 42 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2434921 2434921 31449 19232 11473 128 256 5646 45012 25712 434149 1.468e9 s=100
2434934 2434934 31449 19232 11473 128 256 5646 45012 25712 434149 1.468e9 s=200
2434958 2434958 31447 19232 11473 128 256 5647 45012 25712 434149 1.468e9 s=500
2435302 2435302 31472 19242 11471 128 256 5638 44967 25701 434149 1.468e9 s=1000
2442448 2442448 33990 20405 10651 128 256 4804 42721 24346 388962 1.469e9 s=2000
2503007 2503007 88136 57677 3910 128 256 2867 17748 11016 258793 1.472e9 s=5000
2521861 2521861 106758 76172 3925 128 256 2874 9086 6583 146598 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2434958 2434958 31447 19232 11473 128 256 5647 45012 25712 434149 1.468e9 s=500
Best scaffold N50 is 5647 at s=500.
$ abyss-scaffold -k128 -s100-10000 -n 46 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2439093 2439093 33648 20462 10772 128 256 5035 42248 24208 371546 1.469e9 s=100
2439102 2439102 33648 20462 10772 128 256 5035 42248 24208 371546 1.469e9 s=200
2439119 2439119 33647 20465 10771 128 256 5037 42248 24208 371546 1.469e9 s=500
2439337 2439337 33668 20471 10767 128 256 5029 42248 24200 371546 1.469e9 s=1000
2445432 2445432 36041 21509 10118 128 256 4366 40536 23173 304925 1.469e9 s=2000
2503463 2503463 88592 58133 3910 128 256 2867 17437 10801 258793 1.472e9 s=5000
2521913 2521913 106810 76224 3925 128 256 2874 9086 6547 137016 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2439119 2439119 33647 20465 10771 128 256 5037 42248 24208 371546 1.469e9 s=500
Best scaffold N50 is 5037 at s=500.
$ abyss-scaffold -k128 -s100-10000 -n 50 -G 1267403131 ...
warning: Removed 5458 invalid edges.
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2443246 2443246 36057 21690 10157 128 256 4443 39781 22920 344974 1.469e9 s=100
2443254 2443254 36057 21690 10157 128 256 4443 39781 22920 344974 1.469e9 s=200
2443270 2443270 36057 21690 10157 128 256 4444 39781 22920 344974 1.469e9 s=500
2443397 2443397 36071 21696 10156 128 256 4440 39767 22913 344974 1.469e9 s=1000
2448397 2448397 38210 22592 9586 128 256 4040 38564 22158 343577 1.469e9 s=2000
2503831 2503831 88960 58501 3910 128 256 2867 17171 10644 258793 1.472e9 s=5000
2521961 2521961 106858 76272 3925 128 256 2874 9086 6526 137016 1.473e9 s=10000
n n:100 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
2443270 2443270 36057 21690 10157 128 256 4444 39781 22920 344974 1.469e9 s=500
Best scaffold N50 is 4444 at s=500.
$ abyss-pe FIXMATE_OPTIONS=--qname N=10 ...
make: Nothing to be done for 'pe-bam'.
make: Nothing to be done for 'mp-bam'.
PathConsensus -v --dot -k128 -p0.9 -s tt_16D1C3L12__abyss_128-7.fa -g tt_16D1C3L12__abyss_128-7.dot -o tt_16D1C3L12__abyss_128-7.path tt_16D1C3L12__abyss_128-6.fa tt_16D1C3L12__abyss_128-6.dot tt_16D1C3L12__abyss_128-6.path
Reading `tt_16D1C3L12__abyss_128-6.dot'...
Reading `tt_16D1C3L12__abyss_128-6.fa'...
Reading `tt_16D1C3L12__abyss_128-6.path'...
Read 27839 paths
Ambiguous paths: 74909
Merged: 5953
No paths: 57621
Too many paths: 1236
Too complex: 8452
Dissimilar: 1647
cat tt_16D1C3L12__abyss_128-6.fa tt_16D1C3L12__abyss_128-7.fa \
|MergeContigs -v -k128 -o tt_16D1C3L12__abyss_128-8.fa - tt_16D1C3L12__abyss_128-7.dot tt_16D1C3L12__abyss_128-7.path
Reading `tt_16D1C3L12__abyss_128-7.dot'...
Read 5051014 vertices. Using 697 MB of memory.
Reading `-'...
Read 2525507 sequences. Using 2.52 GB of memory.
Reading `tt_16D1C3L12__abyss_128-7.path'...
Read 31677 paths. Using 2.52 GB of memory.
warning: the head of 7574481+ does not match the tail of the previous contig
AAATAACGACTGTTGGGATTTACTAAAGACGCGCAATTGATCATTAGTGCTGAAAAGGTGTGGTCTACACTGTAAAACCTAACAGTTAAATCATCTCAAACCATTTAAGGAAATCGGTTGCCTTAAA
ggttgccttAAAccgTttaagttttaAAMSACWKTTGaGKAcTKTgaACTWRAGtaAYGYGCAATtaTGcAcTYATatttaagTAGTGtgaaCTKAAAtatARGTGcataacTGcRTMTgACWCaat
7445495+ 675N 7420534- 7574481+ 7543010-
The minimum coverage of single-end contigs is 1.22115.
The minimum coverage of merged contigs is 3.81982.
Consider increasing the coverage threshold parameter, c, to 3.81982.
n n:200 L50 min N80 N50 N20 E-size max sum name
2429899 1478929 24851 200 340 8509 44500 25427 346751 1.329e9 tt_16D1C3L12__abyss_128-8.fa
time user=0.00s system=3.31s elapsed=127.47s cpu=2% memory=4 job=
time user=107.04s system=42.52s elapsed=154.07s cpu=97% memory=2189 job=
ln -sf tt_16D1C3L12__abyss_128-8.fa tt_16D1C3L12__abyss_128-scaffolds.fa
PathOverlap --overlap -v -k128 --dot tt_16D1C3L12__abyss_128-7.dot tt_16D1C3L12__abyss_128-7.path >tt_16D1C3L12__abyss_128-8.dot
Reading `tt_16D1C3L12__abyss_128-7.dot'...
Reading `tt_16D1C3L12__abyss_128-7.path'...
ln -sf tt_16D1C3L12__abyss_128-8.dot tt_16D1C3L12__abyss_128-scaffolds.dot
abyss-fac tt_16D1C3L12__abyss_128-unitigs.fa tt_16D1C3L12__abyss_128-contigs.fa tt_16D1C3L12__abyss_128-scaffolds.fa |tee tt_16D1C3L12__abyss_128-stats.tab
n n:500 L50 min N80 N50 N20 E-size max sum name
3282085 479704 87706 500 1203 2936 6202 4112 57779 921.7e6 tt_16D1C3L12__abyss_128-unitigs.fa
2524721 320183 47968 500 2439 5684 11904 7801 84004 972.2e6 tt_16D1C3L12__abyss_128-contigs.fa
2429899 237545 11119 500 2982 19882 57895 34719 346751 970e6 tt_16D1C3L12__abyss_128-scaffolds.fa
time user=21.22s system=7.00s elapsed=28.88s cpu=97% memory=4 job=
time user=0.00s system=0.01s elapsed=28.88s cpu=0% memory=0 job=
tr '\t' , <tt_16D1C3L12__abyss_128-stats.tab >tt_16D1C3L12__abyss_128-stats.csv
abyss-tabtomd tt_16D1C3L12__abyss_128-stats.tab >tt_16D1C3L12__abyss_128-stats.md
Something went wrong I guess:
--- old.stats 2018-03-30 18:46:41.000000000 +0200
+++ new.stats 2018-03-30 18:46:27.000000000 +0200
@@ -1,9 +1,9 @@
n n:500 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
7278490 540152 103434 200909 1417 500 1008 2430 5164 3450 42126 905.4e6 tt_16D1C3L12__abyss_128-1.fa
3606504 540152 103434 200909 1417 500 1008 2430 5164 3450 42126 905.4e6 tt_16D1C3L12__abyss_128-2.fa
3282085 479704 87706 162732 1801 500 1203 2936 6202 4112 57779 921.7e6 tt_16D1C3L12__abyss_128-3.fa
8089 0 0 0 0 0 0 0 0 0 0 0 tt_16D1C3L12__abyss_128-4.fa
5252 0 0 0 0 0 0 0 0 0 0 0 tt_16D1C3L12__abyss_128-5.fa
2524721 320183 47968 79411 3892 500 2439 5684 11904 7801 84004 972.2e6 tt_16D1C3L12__abyss_128-6.fa
-1093 563 182 563 503 503 784 1227 1854 1371 3813 624124 tt_16D1C3L12__abyss_128-7.fa
-2383961 195687 4463 8913 23272 500 7332 48678 144350 83684 910150 968.8e6 tt_16D1C3L12__abyss_128-8.fa
+786 374 122 374 503 503 755 1174 1755 1287 2911 396799 tt_16D1C3L12__abyss_128-7.fa
+2429899 237545 11119 21569 10231 500 2982 19882 57895 34719 346751 970e6 tt_16D1C3L12__abyss_128-8.fa
Provided abyss-pe --help
outputs the help text of GNU make
I am a bit lost how to pass-down the parameter c
to resolve this:
The minimum coverage of single-end contigs is 1.22115.
The minimum coverage of merged contigs is 3.81982.
Consider increasing the coverage threshold parameter, c, to 3.81982.
abyss-pe help
man abyss-pe
https://github.com/bcgsc/abyss/#assembly-parameters
abyss-pe c=4 …
ABySS picks the values for c
and e
automatically based on the data. It reports much earlier in the log the values that it picked.
Aha, I tried meanwhile blindly
$ abyss-pe FIXMATE_OPTIONS=--qname N=10 c=3.81982 v=-v np=104 j=104 k=128 ...
make: Nothing to be done for 'pe-bam'.
make: Nothing to be done for 'mp-bam'.
PathConsensus -v --dot -k128 -p0.9 -s tt_16D1C3L12__abyss_128-7.fa -g tt_16D1C3L12__abyss_128-7.dot -o tt_16D1C3L12__abyss_128-7.path tt_16D1C3L12__abyss_128-6.fa tt_16D1C3L12__abyss_128-6.dot tt_16D1C3L12__abyss_128-6.path
Reading `tt_16D1C3L12__abyss_128-6.dot'...
Reading `tt_16D1C3L12__abyss_128-6.fa'...
Reading `tt_16D1C3L12__abyss_128-6.path'...
Read 27839 paths
Ambiguous paths: 74909
Merged: 5953
No paths: 57621
Too many paths: 1236
Too complex: 8452
Dissimilar: 1647
cat tt_16D1C3L12__abyss_128-6.fa tt_16D1C3L12__abyss_128-7.fa \
|MergeContigs -v -k128 -o tt_16D1C3L12__abyss_128-8.fa - tt_16D1C3L12__abyss_128-7.dot tt_16D1C3L12__abyss_128-7.path
Reading `tt_16D1C3L12__abyss_128-7.dot'...
Read 5051014 vertices. Using 697 MB of memory.
Reading `-'...
Read 2525507 sequences. Using 2.52 GB of memory.
Reading `tt_16D1C3L12__abyss_128-7.path'...
Read 31677 paths. Using 2.52 GB of memory.
warning: the head of 7574481+ does not match the tail of the previous contig
AAATAACGACTGTTGGGATTTACTAAAGACGCGCAATTGATCATTAGTGCTGAAAAGGTGTGGTCTACACTGTAAAACCTAACAGTTAAATCATCTCAAACCATTTAAGGAAATCGGTTGCCTTAAA
ggttgccttAAAccgTttaagttttaAAMSACWKTTGaGKAcTKTgaACTWRAGtaAYGYGCAATtaTGcAcTYATatttaagTAGTGtgaaCTKAAAtatARGTGcataacTGcRTMTgACWCaat
7445495+ 675N 7420534- 7574481+ 7543010-
The minimum coverage of single-end contigs is 1.22115.
The minimum coverage of merged contigs is 3.81982.
Consider increasing the coverage threshold parameter, c, to 3.81982.
n n:200 L50 min N80 N50 N20 E-size max sum name
2429899 1478929 24851 200 340 8509 44500 25427 346751 1.329e9 tt_16D1C3L12__abyss_128-8.fa
time user=0.00s system=100.41s elapsed=567.83s cpu=17% memory=4 job=
time user=105.20s system=483.96s elapsed=589.66s cpu=99% memory=2189 job=
ln -sf tt_16D1C3L12__abyss_128-8.fa tt_16D1C3L12__abyss_128-scaffolds.fa
PathOverlap --overlap -v -k128 --dot tt_16D1C3L12__abyss_128-7.dot tt_16D1C3L12__abyss_128-7.path >tt_16D1C3L12__abyss_128-8.dot
Reading `tt_16D1C3L12__abyss_128-7.dot'...
Reading `tt_16D1C3L12__abyss_128-7.path'...
ln -sf tt_16D1C3L12__abyss_128-8.dot tt_16D1C3L12__abyss_128-scaffolds.dot
abyss-fac tt_16D1C3L12__abyss_128-unitigs.fa tt_16D1C3L12__abyss_128-contigs.fa tt_16D1C3L12__abyss_128-scaffolds.fa |tee tt_16D1C3L12__abyss_128-stats.tab
n n:500 L50 min N80 N50 N20 E-size max sum name
3282085 479704 87706 500 1203 2936 6202 4112 57779 921.7e6 tt_16D1C3L12__abyss_128-unitigs.fa
2524721 320183 47968 500 2439 5684 11904 7801 84004 972.2e6 tt_16D1C3L12__abyss_128-contigs.fa
2429899 237545 11119 500 2982 19882 57895 34719 346751 970e6 tt_16D1C3L12__abyss_128-scaffolds.fa
time user=21.07s system=5.81s elapsed=27.97s cpu=96% memory=4 job=
time user=0.00s system=0.00s elapsed=27.97s cpu=0% memory=0 job=
ln -sf tt_16D1C3L12__abyss_128-stats.tab tt_16D1C3L12__abyss_128-stats
tr '\t' , <tt_16D1C3L12__abyss_128-stats.tab >tt_16D1C3L12__abyss_128-stats.csv
abyss-tabtomd tt_16D1C3L12__abyss_128-stats.tab >tt_16D1C3L12__abyss_128-stats.md
If the issue is that I did not round the value, please fix the warning message, or round on the fly.
Yeah, thanks for the note where to find the help text, I knew that we already discussed that but I forgot the syntax. Anyway, maybe the logged text message could be improved?
Anyway, after retrying with
$ rm tt_16D1C3L12__abyss_128-7.fa tt_16D1C3L12__abyss_128-7.dot tt_16D1C3L12__abyss_128-8.fa tt_16D1C3L12__abyss_128-scaffolds.fa tt_16D1C3L12__abyss_128-scaffolds.dot tt_16D1C3L12__abyss_128-8.dot tt_16D1C3L12__abyss_128-stats.tab tt_16D1C3L12__abyss_128-stats.md tt_16D1C3L12__abyss_128-stats.csv tt_16D1C3L12__abyss_128-stats
$ abyss-pe FIXMATE_OPTIONS=--qname N=10 c=4 ...
I am getting same results, though.
c
can take any real value. e
can take only integer values. You would need to start the assembly over from the very beginning to change the values of e
and c
, which are parameters of the unitig assembler. I don't usually change these parameters, and use the default values.
OK, I suspected a bit that this is about contigging/unitigging step, so please improve the message. I am fine rerunning the whole as it does make sense to throw away contigs with low coverage.
The manpage is hard to understand to a non-assembler developer in my opinion:
c minimum mean k-mer coverage of a unitig [sqrt(median)]
e minimum erosion k-mer coverage [round(sqrt(median))]
E minimum erosion k-mer coverage per strand [1 if sqrt(median) > 2 else 0]
ABySS detects suitable values for c
and e
and does discard unitigs with low coverage. You shouldn't need to change these values. If you go back to the unitig log file, you can find out what values ABySS chose for these parameters.
So what line should I grep
for in the abyss-pe v=-v
log file?
Loaded 8243644 k-mer
Hash load: 8243644 / 33554432 = 0.246 using 2.93 GB
Minimum k-mer coverage is 21
Coverage: 21 Reconstruction: 210077
Coverage: 10 Reconstruction: 214189
Coverage: 10 Reconstruction: 214189
Using a coverage threshold of 10...
The median k-mer coverage is 100
The reconstruction is 214189
The k-mer coverage threshold is 10
Setting parameter e (erode) to 10
Setting parameter E (erodeStrand) to 1
Setting parameter c (coverage) to 10
Minimum k-mer coverage is 10
0: Coverage: 10 Reconstruction: 1237153695
0: Coverage: 6.4 Reconstruction: 1285143208
0: Coverage: 6.32 Reconstruction: 1285143208
Using a coverage threshold of 6...
The median k-mer coverage is 40
The reconstruction is 1285143208
The k-mer coverage threshold is 6.32
Setting parameter e (erode) to 6
Setting parameter E (erodeStrand) to 1
Setting parameter c (coverage) to 6.32
Finding adjacenct k-mer...
$ abyss-pe FIXMATE_OPTIONS=--qname N=10 c=4 v=-v np=104 j=104 k=128 ...
...
warning: -c,--coverage was specified, but -e,--erode was not specified
Previously, the default was -e2 (or --erode=2).
ABySS 2.0.3
ABYSS-P -k128 -q3 -c4 -v ...
So the original Consider increasing the coverage threshold parameter, c, to 3.81982
message was incomplete and did not tell me I am supposed to alter e
value too.
When abyss-pe detects c=4
in its input, it would have been wise to output what it ended up using. @sjackman Please, make the messages clearer. I do not mind abyss is not fully automated solution but the messages to the user should not state just half of the truth. Thank you.
Since ABySS autodetected used c=6.32
I wouldn't recommend decreasing it to c=4
.
But didn't the output from MergeContigs
say increase the threshold to increase strictness?
The minimum coverage of single-end contigs is 1.22115.
The minimum coverage of merged contigs is 3.81982.
Consider increasing the coverage threshold parameter, c, to 3.81982.
Then the messages printed in step 7 of the assembly (https://github.com/bcgsc/abyss/issues/187#issuecomment-377565339) are just wrong.
Yep, it is pretty much wrong. The message was originally written intended to be printed after stage 3 of the assembly. That same tool MergeContigs
was then later used at stages 6 and 8 of the assembly, but still prints that same message, thought it's not accurate at the later stages.
Thank you for clarification, now I am starting to understand.
Meanwhile the new job from https://github.com/bcgsc/abyss/issues/187#issuecomment-377696140 with abyss-pe ... N=10 c=4 ...
progressed and reported:
Loaded 11630646625 k-mer. At least 837 GB of RAM is required.
Minimum k-mer coverage is 10
0: Coverage: 10 Reconstruction: 1237153695
0: Coverage: 6.4 Reconstruction: 1285143208
0: Coverage: 6.32 Reconstruction: 1285143208
Using a coverage threshold of 6...
The median k-mer coverage is 40
The reconstruction is 1285143208
The k-mer coverage threshold is 6.32
Setting parameter e (erode) to 6
Setting parameter E (erodeStrand) to 1
Finding adjacent k-mer...
The line Setting parameter c (coverage) to 6.32
is missing. Either due to 2.0.2 to 2.0.3 upgrade or due to some logic behind.
Here is more from the abyss-pe ... N=10 c=4 ...
job progressing now:
$ abyss-fac -G 1267403131 tt_16D1C3L12__abyss_128-[0-9].fa
n n:500 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
8420759 540881 104722 206957 1371 500 1000 2388 5040 3376 42131 897.4e6 tt_16D1C3L12__abyss_128-1.fa
3990013 540881 104722 206957 1371 500 1000 2388 5040 3376 42131 897.4e6 tt_16D1C3L12__abyss_128-2.fa
3659188 478583 87983 165909 1758 500 1197 2906 6126 4072 57779 914.4e6 tt_16D1C3L12__abyss_128-3.fa
and here is the original result with autodetected values (c=6.32
, cannot say what the N=
really was):
$ abyss-fac -G 1267403131 tt_16D1C3L12__abyss_128-[0-9].fa
n n:500 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
7278490 540152 103434 200909 1417 500 1008 2430 5164 3450 42126 905.4e6 tt_16D1C3L12__abyss_128-1.fa
3606504 540152 103434 200909 1417 500 1008 2430 5164 3450 42126 905.4e6 tt_16D1C3L12__abyss_128-2.fa
3282085 479704 87706 162732 1801 500 1203 2936 6202 4112 57779 921.7e6 tt_16D1C3L12__abyss_128-3.fa
I stoppped this abyss-pe ... N=10 c=4 ...
job and started a new-one with just abyss-pe ... N=10 ...
.
Iterestingly, the job with abyss-pe ... N=10 ...
needed 1TB of RAM but the results are exactly same as before:
$ abyss-fac -G 1267403131 tt_16D1C3L12__abyss_128-[0-9].fa
n n:500 L50 LG50 NG50 min N80 N50 N20 E-size max sum name
7278490 540152 103434 200909 1417 500 1008 2430 5164 3450 42126 905.4e6 tt_16D1C3L12__abyss_128-1.fa
3606507 540152 103434 200909 1417 500 1008 2430 5164 3450 42126 905.4e6 tt_16D1C3L12__abyss_128-2.fa
3282088 479704 87706 162732 1801 500 1203 2936 6202 4112 57779 921.7e6 tt_16D1C3L12__abyss_128-3.fa
8089 0 0 0 0 0 0 0 0 0 0 0 tt_16D1C3L12__abyss_128-4.fa
5248 0 0 0 0 0 0 0 0 0 0 0 tt_16D1C3L12__abyss_128-5.fa
2524714 320183 47968 79412 3892 500 2439 5684 11904 7801 84004 972.2e6 tt_16D1C3L12__abyss_128-6.fa
1093 563 182 563 503 503 784 1227 1854 1371 3813 624124 tt_16D1C3L12__abyss_128-7.fa
2383939 195685 4463 8913 23272 500 7332 48678 144350 83684 910150 968.8e6 tt_16D1C3L12__abyss_128-8.fa
N=10
is the default as stated in https://github.com/bcgsc/abyss/issues/187#issuecomment-377417061 so the above is not a surprise except the higher memory footprint. But maybe that has to do with upgrade to abyss-2.0.3 which is designed to have larger footprint.
The line Setting parameter c (coverage) to 6.32 is missing. Either due to 2.0.2 to 2.0.3 upgrade or due to some logic behind.
The message is missing because you have specified c=4
, so it's using the specified c=4
and not using c=6.32
.
Apologies for reviving old threads but my recent experience has thought me to also 'play/optimise' with the n for the mapping step. @mmokrejs , I was roughly in the same situation as you are/were (aka seeing very little difference when changing N for scaffolding/contig building), after several tries I figured out that abyss most likely already filters out way to much data on the DistanceEst step, which then consequently does not end up anymore in the contiging/scaffolding step. (see this thread https://github.com/bcgsc/abyss/issues/258 for more info)
Yes, you can set n
to 1 for DistanceEst to retain everything, so that you have more control at later steps.
Hi, I wonder where I could upload few nice figures showing performance of abyss-2.0.2 on our server. It was run in MPI-enabled with OpenMP-enabled mode in steps where no OpenMP is support exists. Thanks to https://github.com/bcgsc/abyss/issues/185#issuecomment-363173408 .
abyss-pe v=-v np=104 j=104 k=128 ...
It has 3.2TB RAM and 112 CPU cores. I used only 104 CPUs. It finished after 1741 minutes wallclock time which is about 29hrs.
The read IO from our LustreFS could be much higher although I see the 104 CPU cores ran at full speed. If I use some of the tools counting k-mers or doing read error-correction from the BBmap bundle ( https://sourceforge.net/projects/bbmap ) I see much higher read IO. So this maybe reveals you could still do better during the initial steps of parsing input FASTQ files and splitting them into k-mers and counting them. But I do not have a hard proof and I did not bother to check whether abyss does only this.
In overall, I am quite happy, thank you for a nice tool. My real concern now is the precision of mate-pair mapping but that is another story. I see other github issues are opened about this.
I wanted to include some numbers of input read pairs, how many were discarded, average contig/unitig sizes etc., but it is well hidden in the logfile. So, no numbers.