HingeAssembler / HINGE

Software accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution"
http://genome.cshlp.org/content/27/5/747.full.pdf+html?sid=39918b0d-7a7d-4a12-b720-9238834902fd
Other
64 stars 9 forks source link

Layout 0 active hinges #116

Closed ebioman closed 7 years ago

ebioman commented 7 years ago

I am trying to re-assemble a small piece of a large genome (which assembled correctly with other assemblers) using the multi-las approach and am experiencing a few strange points:

  1. the read filtering reports that none of the reads is removed, is this expected ?

  2. the multi-threading for the layout step seems not to work. I define 12 CPU's in the nominal.ini but it never uses more than one. How is this implemented, maybe I am missing some library?

  3. the layout steps remove all of my overlaps


[2017-06-13 10:19:52.415] [log] [info] Load alignments from test.1.las
[2017-06-13 10:19:52.415] [log] [info] # Alignments: 5631866
[2017-06-13 10:20:14.940] [log] [info] # reads: 17734
[2017-06-13 10:20:14.940] [log] [info] # active reads: 0/17734
[2017-06-13 10:20:14.940] [log] [info] Input data finished, part 1/16
[2017-06-13 10:20:16.287] [log] [info] kept 0/5631866 overlaps,  0/2768815 rev_overlaps in part 1/16
[2017-06-13 10:20:16.287] [log] [info] index finished
......
Similarly for my other 16 overlap results.
......
[2017-06-13 10:32:25.359] [log] [info] Building hinge graph
[2017-06-13 10:32:25.416] [log] [info] num hinges 62482
[2017-06-13 10:32:25.703] [log] [info] Hinge graph built
Total number of components: 62482
[2017-06-13 10:32:26.286] [log] [info] after filter 0 active hinges
[2017-06-13 10:32:26.539] [log] [info] Starting to build assembly graph.
[2017-06-13 10:32:26.563] [log] [info] sort and output finished
[2017-06-13 10:32:26.563] [log] [info] version 0.0.3

I tried similarly with a single las:


[2017-06-13 11:09:36.317] [log] [info] name of las: test.las
[2017-06-13 11:09:36.335] [log] [info] Load alignments from test.las
[2017-06-13 11:09:36.335] [log] [info] # Alignments: 16927078
[2017-06-13 11:10:47.286] [log] [info] # reads: 105189
[2017-06-13 11:10:47.286] [log] [info] # active reads: 0/105189
[2017-06-13 11:10:47.286] [log] [info] Input data finished, part 1/1
[2017-06-13 11:10:53.000] [log] [info] kept 0/16927078 overlaps,  0/8335689 rev_overlaps in part 1/1
[2017-06-13 11:10:53.000] [log] [info] index finished
[2017-06-13 11:10:53.008] [log] [info] kept 0/16927078 overlaps,  0/8335689 rev_overlaps in 1 part(s)
[2017-06-13 11:10:53.026] [log] [info] 0 overlaps
[2017-06-13 11:10:53.026] [log] [info] 0 rev overlaps
[2017-06-13 11:10:53.051] [log] [info] removed contained reads, active reads: 0
[2017-06-13 11:10:53.066] [log] [info] active reads: 0
[2017-06-13 11:10:54.046] [log] [info] 0 killed hinges
[2017-06-13 11:10:54.046] [log] [info] 0 hinges
[2017-06-13 11:10:54.906] [log] [info] 0 active hinges
[2017-06-13 11:10:54.928] [log] [info] Building hinge graph
[2017-06-13 11:10:54.993] [log] [info] num hinges 6967
[2017-06-13 11:10:55.105] [log] [info] Hinge graph built
Total number of components: 6967
[2017-06-13 11:10:55.258] [log] [info] after filter 0 active hinges
[2017-06-13 11:10:55.414] [log] [info] Starting to build assembly graph.
[2017-06-13 11:10:55.434] [log] [info] sort and output finished
[2017-06-13 11:10:55.434] [log] [info] version 0.0.3

I used the demo configuration and have ~100x coverage

Versions: HINGE: 2d70ea7216dc0a3c085ed422892f9ed94d103c74 DAZZ_DB: ff5cfec955496fbc1f5ab6735e0a832976dd2995 DALIGNER: 9e9acd358d2d8b6d24769f58f7de991c47292ce2 DASCUBBER: 77dd9555dac79c9f3040e582c4b44fc3fce16e32

My steps:

Multi-las:

fasta2DB test test.subreads.fasta
DBsplit test
HPC.daligner test | bash -v
for i in {1..16}; do DASqv -c90 test.db test.${i}.las; done
Catrack test.db qual
hinge filter --db test --las test --mlas -x test --config nominal.ini
hinge layout --db test --las test --mlas -x test --config nominal.ini -o test

Single-las:

 LAmerge test.las test.[1-16].las
DASqv -c100  test test.las
hinge filter --db test --las test  -x test --config nominal.ini
hinge layout --db test --las test.las  -x test --config nominal.ini -o test
govinda-kamath commented 7 years ago
  1. the multi-threading for the layout step seems not to work. I define 12 CPU's in the nominal.ini but it never uses more than one. How is this implemented, maybe I am missing some library?

The multi-core aspect is not yet implemented. So this does not work.

  1. the read filtering reports that none of the reads is removed, is this expected ?

  2. the layout steps remove all of my overlaps

For the other two questions can you give us the log files, which are stored in the log directory within the directory you're executing the code from? (These files contain no sequence information in case that's of concern.)

That may help us understand what's going on here. Both those behaviours are not what I'd expect. I'd also be curious to see the output of maximal, the intermediate step there.

ebioman commented 7 years ago

Please find the logs here

log_2017-06-13_10-18.corrected.txt

govinda-kamath commented 7 years ago

Can you give us the log file of maximal as well?

ebioman commented 7 years ago

Please ignore the problem. My problem was that I followed the example in the "demo" folder in which the step maximal was absent. Including it I could proceed.

Last question regarding the --mlas option, it seems not to be implemented in the later step of the draft ?

govinda-kamath commented 7 years ago

Thanks. I fixed the demos (which were old). @fxia22 fixed draft so that it does not load the whole las file and is memory efficient without the --mlas option.