ekg / seqwish

alignment to variation graph inducer
MIT License
143 stars 18 forks source link

"Missing" edges #4

Closed fbemm closed 5 years ago

fbemm commented 6 years ago

Hi,

the seqwish GFA seems to miss some edges for loops:

P 4 34+,32-,32-,32-,344- # only a part of the path S 32 G L 32 + 32 + OM L 32 + 32 + OM L 32 + 32 - OM L 32 - 32 + OM

Should'nt there be a:

L 32 - 32 - OM

Works otherwise pretty great!

F

ekg commented 5 years ago

Has this been resolved? I was using vg view -Fv to detect when this happens. I needed to add a mutex lock on a shared bitvector. Without it this tended to happen.

On Thu, Oct 25, 2018, 11:47 Felix Bemm notifications@github.com wrote:

Hi,

the seqwish GFA seems to miss some edges for loops:

P 4 34+,32-,32-,32-,344- # only a part of the path S 32 G L 32 + 32 + OM L 32 + 32 + OM L 32 + 32 - OM L 32 - 32 + OM

Should'nt there be a:

L 32 - 32 - OM

Works otherwise pretty great!

F

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4ETtL2afVWi7ImO_YlXzywQZ4-FKKks5uoYihgaJpZM4X55Vb .

fbemm commented 5 years ago

Testing with the latest commit atm. I keep you updated.

fbemm commented 5 years ago

Getting a core dump right now for a data set that worked before. Digging..

*** Error in `seqwish': free(): invalid next size (normal): 0x0000152ac00009f0 ***
======= Backtrace: =========
[0x56ecd1]
[0x5779c6]
[0x57b9f7]
[0x53bfe9]
[0x4843be]
[0x484510]
[0x539544]
[0x53cf0c]
[0x53aa2a]
[0x548485]
[0x5bc719]
fbemm commented 5 years ago

The GFA file now contains when debugging switched on:

St1_2.1 974421  890470  899546  +       St1_2.69        206126  126841  135687  5185    9098    0       cg:Z:
St1_2.1 974421  0       5649    +       St1_2.272       329270  323592  329254  4873    5675    0       cg:Z:
fbemm commented 5 years ago

I am not able to run seqwish on the above mentioned data set any more, and can't trace the reason for the segfault either. Switching back to an earlier release works (f75121a3b50a436ef394bb94d8672af69ae1545f).

fbemm commented 5 years ago

Switching from Ubuntu 16.04 to 18.04 solved the segfaults.

ekg commented 5 years ago

That's great! Can you work with the output graph?

On Thu, Jan 31, 2019, 10:42 Felix Bemm <notifications@github.com wrote:

Switching from Ubuntu 16.04 to 18.04 solved the segfaults.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/4#issuecomment-459298578, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EVa_lZj8eYJNau8Bks7ww4yOz_yBks5vIsibgaJpZM4X55Vb .

fbemm commented 5 years ago

One thing that might be problematic is that quite a number of links are simply duplicated. Currently testing if removing them solves the "duplicated rank" problem I always run into when indexing with vg. But otherwise it looks fine. I will report back in case I discover further challenges.

ekg commented 5 years ago

That's interesting, and shouldn't happen. Can you find the duplicate L records in the output GFA?

On Thu, Jan 31, 2019, 13:11 Felix Bemm <notifications@github.com wrote:

One thing that might be problematic is, that quite a number of links are simply duplicated. Currently testing if that solves the "duplicated rank" problem I always run into when indexing with vg.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/4#issuecomment-459338730, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4ET_y6NRrkGBaqrcPlWZON_pKO4uVks5vIuuWgaJpZM4X55Vb .

fbemm commented 5 years ago

Yes. Just an example:

L       1504    +       34338705        -       OM
L       1504    +       1505    +       OM
L       1504    +       1505    +       OM
L       1505    +       1506    +       OM

Using a brute force sort it's about 3/4 of my links are actually duplicated exactly 2 times.

In this particular case it's 15,850,483 out of 65,419,141 links that are unique, the rest is present 2 times.

fbemm commented 5 years ago

This somehow disappeared.