ekg / seqwish

alignment to variation graph inducer
MIT License
143 stars 18 forks source link

(core dumped) seqwish #36

Closed HaploKit closed 4 years ago

HaploKit commented 4 years ago

Hi, I am trying to use seqwish to construct variation graph from raw long reads. It run successfully when using 10,000 reads, however, It failed when running 100,000 reads, these reads are from the same fastq file. Here are the commands and error information: prefix=test; minimap2 -x ava-ont -t 48 -c -X $prefix.fq $prefix.fq >$prefix.paf; seqwish -t 32 -s $prefix.fq -p $prefix.paf -g $prefix.gfa length for 9b3413df-02a4-423e-8bd0-bb1e69cf7a93, expected 450 but got 0 seqwish: /home/software/seqwish-0.1/src/gfa.cpp:133: void seqwish::emit_gfa(std::ostream&, size_t, const string&, mmmulti::iitree<long unsigned int, long unsigned int>&, mmmulti::iitree<long unsigned int, long unsigned int>&, const sdsl::sd_vector<>&, const rank_1_type&, const select_1_type&, seqwish::seqindex_t&, mmmulti::set<std::pair<long unsigned int, long unsigned int> >&): Assertion `false' failed. work.sh: line 4: 7666 Aborted (core dumped) seqwish -t 32 -s $prefix.fq -p $prefix.paf -g $prefix.gfa Command exited with non-zero status 134

It seems maybe there is something wrong with this read '9b3413df-02a4-423e-8bd0-bb1e69cf7a93' in test.fq, here is the sequence and quality score for this read: @9b3413df-02a4-423e-8bd0-bb1e69cf7a93 CGTACTTCGTTCAGTTACGTATTGCTAAGGGTTAAAATCAAGACTCGCTGTGCCTAGTTCAGCACCTGTTTCCCACTGGAGGATAATGGGACGCCAGTTTCGAAGGAAACGTTGTTGGGATACTACCCCCACTTGTGTTATAGCCTCTAACCCGGGTAGGTGATCCCTATCGGAGACAGTGTCTGACAGGCAGTTTGACTGGGGCAGTCGCCTCTAAAAGGTAACAGGAGAGCCCAAAGGTTCCCTCAGAATGGTTGGAAGTAATTGCAGTGTAAAGATGTCAGGGCTTGACTGAGCTACAACTCGAGCAGGGACGAAAAGAGATCGAACAGTGATCCGGTGGTTCCGTATGGAAGACCGTACTCAGCAGGATCAAATGCCGTGTCTCAGTTGGAAGCAGGTGCTGAACTGAAGCTGGCTTGAGTCTTGGTTTAACATGGCAATACGTAA + &%,$(<9?BI7E1?02.))-++.,784*'543/35053?>0=9EB>:79.(.9.6,0-;8.8=>?BBB1E?<><1;<@:2:;324477?C?;=AK5D?H<6&%'&1#%&),0:=;809>?96))-12,&$##&&2/':&,((&##$%%)&()$+('&))'55-338())%/)022744<@96;?;($$/4114CAD>=ACA=5.((8>>?>;)?1C?;)(62B-''3($*-*)7%18*/?KAA>A,C:34CGI>G?5&)*)**%%(###%.$+3-%'$##%$&&(())*)%$##&)//-9<7?6?820++@A:B>?CA0%%$'-2;(%'$%)0??565:;D>;H;<;0FGIH=64/%43,,)%(0&&)*+/($&'*)##%'%%$&/'0$&#%++*.$$$(&92')8>@=;/-*)%'%2013<=@<>@>),7=6+'&$$#./5+)):977%

Any help would be appreciated.

ekg commented 4 years ago

You've found an error that I'm aware of. Sorry for this. You can downgrade and get a much lower-performance version of seqwish that should induce the same graph.

I have a test case and I need to use it to fix this. If you can share yours in some way then I'd be very appreciative as well.

HaploKit commented 4 years ago

You've found an error that I'm aware of. Sorry for this. You can downgrade and get a much lower-performance version of seqwish that should induce the same graph.

I have a test case and I need to use it to fix this. If you can share yours in some way then I'd be very appreciative as well.

Thank you for your quick response. But I use release version of v0.1 of seqwish from here: https://github.com/ekg/seqwish/releases/tag/v0.1 . Should I try v0.2 ?

ekg commented 4 years ago

Try v0.2.

If not that, then try the current master.

On Sun, Mar 22, 2020, 15:34 Vincent notifications@github.com wrote:

You've found an error that I'm aware of. Sorry for this. You can downgrade and get a much lower-performance version of seqwish that should induce the same graph.

I have a test case and I need to use it to fix this. If you can share yours in some way then I'd be very appreciative as well.

Thank you for your quick response. But I use release version of v0.1 of seqwish from here: https://github.com/ekg/seqwish/releases/tag/v0.1 . Should I try v0.2 ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-602215842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEKFOL22VSGTPXKD3TTRIYOXHANCNFSM4LRJ4TJA .

ekg commented 4 years ago

And please let me know which work or not.

On Sun, Mar 22, 2020, 15:40 Erik Garrison erik.garrison@gmail.com wrote:

Try v0.2.

If not that, then try the current master.

On Sun, Mar 22, 2020, 15:34 Vincent notifications@github.com wrote:

You've found an error that I'm aware of. Sorry for this. You can downgrade and get a much lower-performance version of seqwish that should induce the same graph.

I have a test case and I need to use it to fix this. If you can share yours in some way then I'd be very appreciative as well.

Thank you for your quick response. But I use release version of v0.1 of seqwish from here: https://github.com/ekg/seqwish/releases/tag/v0.1 . Should I try v0.2 ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-602215842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEKFOL22VSGTPXKD3TTRIYOXHANCNFSM4LRJ4TJA .

HaploKit commented 4 years ago

And please let me know which work or not. On Sun, Mar 22, 2020, 15:40 Erik Garrison @.> wrote: Try v0.2. If not that, then try the current master. On Sun, Mar 22, 2020, 15:34 Vincent @.> wrote: > You've found an error that I'm aware of. Sorry for this. You can > downgrade and get a much lower-performance version of seqwish that should > induce the same graph. > > I have a test case and I need to use it to fix this. If you can share > yours in some way then I'd be very appreciative as well. > > Thank you for your quick response. But I use release version of v0.1 of > seqwish from here: > https://github.com/ekg/seqwish/releases/tag/v0.1 . Should I try v0.2 ? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#36 (comment)>, or > unsubscribe > https://github.com/notifications/unsubscribe-auth/AABDQEKFOL22VSGTPXKD3TTRIYOXHANCNFSM4LRJ4TJA > . >

Hi, I tried both v0.2 and the current master, but it failed when building, I could not figure it out. Could you please help me? here is the error info:

cmake -H. -Bbuild && cmake --build build -- -j3 -- The C compiler identification is GNU 7.3.0 -- The CXX compiler identification is GNU 7.3.0 -- Check for working C compiler: /export/scratch1/home/vincent/software/miniconda3/bin/x86_64-conda_cos6-linux-gnu-cc -- Check for working C compiler: /export/scratch1/home/vincent/software/miniconda3/bin/x86_64-conda_cos6-linux-gnu-cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /export/scratch1/home/vincent/software/miniconda3/bin/x86_64-conda_cos6-linux-gnu-c++ -- Check for working CXX compiler: /export/scratch1/home/vincent/software/miniconda3/bin/x86_64-conda_cos6-linux-gnu-c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'sdsl-lite' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/sdsl-lite

is not an existing non-empty directory. Please specify one of:

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'tayweeargs' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/args

is not an existing non-empty directory. Please specify one of:

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'gzipreader' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/gzip_reader

is not an existing non-empty directory. Please specify one of:

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'mmmultimap' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/mmmultimap

is not an existing non-empty directory. Please specify one of:

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'iitii' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/iitii

is not an existing non-empty directory. Please specify one of:

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'mmap_allocator' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/mmap_allocator

is not an existing non-empty directory. Please specify one of:

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'ips4o' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/ips4o

is not an existing non-empty directory. Please specify one of:

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'bbhash' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/BBHash

is not an existing non-empty directory. Please specify one of:

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'atomicbitvector' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/atomicbitvector

is not an existing non-empty directory. Please specify one of:

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'atomicqueue' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/atomic_queue

is not an existing non-empty directory. Please specify one of:

-- Configuring incomplete, errors occurred! See also "/export/scratch3/vincent/software/seqwish-0.2/build/CMakeFiles/CMakeOutput.log".

ekg commented 4 years ago

git clone --recursive

or to avoid pulling again

git submodules update --init --recursive git submodules sync git submodules update --init --recursive

On Sun, Mar 22, 2020 at 8:31 PM Vincent notifications@github.com wrote:

And please let me know which work or not. … <#m8859744613341929828> On Sun, Mar 22, 2020, 15:40 Erik Garrison @.> wrote: Try v0.2. If not that, then try the current master. On Sun, Mar 22, 2020, 15:34 Vincent @.> wrote: > You've found an error that I'm aware of. Sorry for this. You can > downgrade and get a much lower-performance version of seqwish that should > induce the same graph. > > I have a test case and I need to use it to fix this. If you can share > yours in some way then I'd be very appreciative as well. > > Thank you for your quick response. But I use release version of v0.1 of > seqwish from here: > https://github.com/ekg/seqwish/releases/tag/v0.1 . Should I try v0.2 ? >

— > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#36 (comment) https://github.com/ekg/seqwish/issues/36#issuecomment-602215842>, or > unsubscribe > https://github.com/notifications/unsubscribe-auth/AABDQEKFOL22VSGTPXKD3TTRIYOXHANCNFSM4LRJ4TJA . >

Hi, I tried both v0.2 and the current master, but it failed when building, I could not figure it out. Could you please help me? here is the error info:

cmake -H. -Bbuild && cmake --build build -- -j3 -- The C compiler identification is GNU 7.3.0 -- The CXX compiler identification is GNU 7.3.0 -- Check for working C compiler: /export/scratch1/home/vincent/software/miniconda3/bin/x86_64-conda_cos6-linux-gnu-cc -- Check for working C compiler: /export/scratch1/home/vincent/software/miniconda3/bin/x86_64-conda_cos6-linux-gnu-cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /export/scratch1/home/vincent/software/miniconda3/bin/x86_64-conda_cos6-linux-gnu-c++ -- Check for working CXX compiler: /export/scratch1/home/vincent/software/miniconda3/bin/x86_64-conda_cos6-linux-gnu-c++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'sdsl-lite' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/sdsl-lite

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:53 (ExternalProject_Add)

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'tayweeargs' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/args

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:65 (ExternalProject_Add)

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'gzipreader' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/gzip_reader

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:73 (ExternalProject_Add)

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'mmmultimap' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/mmmultimap

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:81 (ExternalProject_Add)

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'iitii' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/iitii

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:90 (ExternalProject_Add)

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'mmap_allocator' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/mmap_allocator

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:97 (ExternalProject_Add)

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'ips4o' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/ips4o

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:106 (ExternalProject_Add)

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'bbhash' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/BBHash

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:115 (ExternalProject_Add)

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'atomicbitvector' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/atomicbitvector

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:125 (ExternalProject_Add)

CMake Error at /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:2525 (message): No download info given for 'atomicqueue' and its source directory:

/export/scratch3/vincent/software/seqwish-0.2/deps/atomic_queue

is not an existing non-empty directory. Please specify one of:

  • SOURCE_DIR with an existing non-empty directory
  • DOWNLOAD_COMMAND
  • URL
  • GIT_REPOSITORY
  • SVN_REPOSITORY
  • HG_REPOSITORY
  • CVS_REPOSITORY and CVS_MODULE Call Stack (most recent call first): /export/scratch3/vincent/software/miniconda3/share/cmake-3.12/Modules/ExternalProject.cmake:3100 (_ep_add_download_command) CMakeLists.txt:135 (ExternalProject_Add)

-- Configuring incomplete, errors occurred! See also "/export/scratch3/vincent/software/seqwish-0.2/build/CMakeFiles/CMakeOutput.log".

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-602260321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEO4B2VTQNQ2T3GVTV3RIZRQ5ANCNFSM4LRJ4TJA .

HaploKit commented 4 years ago

Thank you very much! It works now. But there is still 'core dumped' error when running this test data: https://drive.google.com/file/d/1yvTEAYTZJnCabr3J9nGCaV6uizJcCaj7/view?usp=sharing It would be great if you can download and test.

The commands I use: prefix=test minimap2 -x ava-ont -t 48 -c -X $prefix.fq $prefix.fq >$prefix.paf seqwish -t 32 -s $prefix.fq -p $prefix.paf -g $prefix.gfa Here is the error infor:

length for 9b3413df-02a4-423e-8bd0-bb1e69cf7a93, expected 450 but got 451 seqwish: /export/scratch3/vincent/software/seqwish/src/gfa.cpp:137: void seqwish::emit_gfa(std::ostream&, size_t, const string&, mmmulti::iitree<long unsigned int, long unsigned int>&, mmmulti::iitree<long unsigned int, long unsigned int>&, const sdsl::sd_vector<>&, const rank_1_type&, const select_1_type&, seqwish::seqindex_t&, mmmulti::set<std::pair<long unsigned int, long unsigned int> >&): Assertion `false' failed. work.sh: line 4: 14107 Aborted (core dumped) seqwish -t 32 -s $prefix.fq -p $prefix.paf -g $prefix.gfa Command exited with non-zero status 134

ekg commented 4 years ago

Yeah that's the bug. Sorry about this it will take me a few days to get to.

On Sun, Mar 22, 2020, 23:04 Vincent notifications@github.com wrote:

Thank you very much! It works now. But there is still 'core dumped' error when running this test data: https://drive.google.com/file/d/1yvTEAYTZJnCabr3J9nGCaV6uizJcCaj7/view?usp=sharing It would be great if you can download and test.

The commands I use: prefix=test minimap2 -x ava-ont -t 48 -c -X $prefix.fq $prefix.fq

$prefix.paf seqwish -t 32 -s $prefix.fq -p $prefix.paf -g $prefix.gfa Here is the error infor:

length for 9b3413df-02a4-423e-8bd0-bb1e69cf7a93, expected 450 but got 451 seqwish: /export/scratch3/vincent/software/seqwish/src/gfa.cpp:137: void seqwish::emit_gfa(std::ostream&, size_t, const string&, mmmulti::iitree<long unsigned int, long unsigned int>&, mmmulti::iitree<long unsigned int, long unsigned int>&, const sdsl::sd_vector<>&, const rank_1_type&, const select_1_type&, seqwish::seqindex_t&, mmmulti::set<std::pair<long unsigned int, long unsigned int> >&): Assertion `false' failed. work.sh: line 4: 14107 Aborted (core dumped) seqwish -t 32 -s $prefix.fq -p $prefix.paf -g $prefix.gfa Command exited with non-zero status 134

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-602283000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQENK43YKGXF2LJL5TSDRI2DPLANCNFSM4LRJ4TJA .

HaploKit commented 4 years ago

Yeah that's the bug. Sorry about this it will take me a few days to get to. On Sun, Mar 22, 2020, 23:04 Vincent @.***> wrote: Thank you very much! It works now. But there is still 'core dumped' error when running this test data: https://drive.google.com/file/d/1yvTEAYTZJnCabr3J9nGCaV6uizJcCaj7/view?usp=sharing It would be great if you can download and test. The commands I use: prefix=test minimap2 -x ava-ont -t 48 -c -X $prefix.fq $prefix.fq >$prefix.paf seqwish -t 32 -s $prefix.fq -p $prefix.paf -g $prefix.gfa Here is the error infor: length for 9b3413df-02a4-423e-8bd0-bb1e69cf7a93, expected 450 but got 451 seqwish: /export/scratch3/vincent/software/seqwish/src/gfa.cpp:137: void seqwish::emit_gfa(std::ostream&, size_t, const string&, mmmulti::iitree<long unsigned int, long unsigned int>&, mmmulti::iitree<long unsigned int, long unsigned int>&, const sdsl::sd_vector<>&, const rank_1_type&, const select_1_type&, seqwish::seqindex_t&, mmmulti::set<std::pair<long unsigned int, long unsigned int> >&): Assertion `false' failed. work.sh: line 4: 14107 Aborted (core dumped) seqwish -t 32 -s $prefix.fq -p $prefix.paf -g $prefix.gfa Command exited with non-zero status 134 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQENK43YKGXF2LJL5TSDRI2DPLANCNFSM4LRJ4TJA .

OK,thanks.

HaploKit commented 4 years ago

Hi, Erik, is there any update on this issue? Thank you.

ekg commented 4 years ago

I need to solve this as much as you do. Thank you for checking back.

To be clear, I haven't been able to work much due to focus on COVID-19 issues (both grants and ones related to being under lockdown). I'm sorry this isn't going fast.

The problem is likely a mistake with how the transitive closure or node boundary identification is running in the case of sequences that invert relative to the graph sequence vector many times. At least, that's the feature of the sequences that trigger the error.

On Thu, Apr 2, 2020 at 1:05 PM Vincent notifications@github.com wrote:

Hi, Erik, is there any update on this issue? Thank you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-607777751, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEJQPSGG56NGVKZ4RDTRKRWN7ANCNFSM4LRJ4TJA .

ekg commented 4 years ago

Getting a really simple test case which can produce the error would be helpful. Something involving just a few sequences. In that case it can be easier to see everything that's going on and debug it properly. Until then, it's possible to produce debugging output relative to the sequence where we find the error. That's OK but not ideal. If you have a smaller test case please let me know.

On Thu, Apr 2, 2020 at 2:03 PM Erik Garrison erik.garrison@gmail.com wrote:

I need to solve this as much as you do. Thank you for checking back.

To be clear, I haven't been able to work much due to focus on COVID-19 issues (both grants and ones related to being under lockdown). I'm sorry this isn't going fast.

The problem is likely a mistake with how the transitive closure or node boundary identification is running in the case of sequences that invert relative to the graph sequence vector many times. At least, that's the feature of the sequences that trigger the error.

On Thu, Apr 2, 2020 at 1:05 PM Vincent notifications@github.com wrote:

Hi, Erik, is there any update on this issue? Thank you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-607777751, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEJQPSGG56NGVKZ4RDTRKRWN7ANCNFSM4LRJ4TJA .

ekg commented 4 years ago

Please let me know if this resolves the problem you were running into!

I had a range extension check that was not implemented correctly for both strands. Somehow, this was only a problem in certain kinds of graphs, typically those with extremely complex tangled regions.

ekg commented 4 years ago

Also, it might make sense to use the -X option when mapping.

I got this to make what seems like a reasonable graph:

minimap2 -cx asm20 -t 48 -X test.fastq test.fastq >test.paf
ekg commented 4 years ago

@vincentluo91 I tried this:

minimap2 -cx asm20 -t 48 -X test.fastq test.fastq >test.paf
seqwish -t 48 -s test.fastq -p test.1.paf -g test.gfa
odgi build -g test.gfa -o - -p | odgi sort -p bSn -A -i - -t 48 -o test.odgi
odgi viz -i test.odgi -o test.odgi.png -x 4000 -y 400 -P 1

image

This shows that a bit of the graph is covered by a lot of the reads, but there are a ton of "tips" (part to the right, mostly empty) where the read ends aren't aligning to each other.

Not sure if this matches what you'd expect, but I thought I'd share the process I would use initially.

In odgi or another related tool, I plan to work out assembly graph steps like tip pruning and bubble popping. They both amount to a kind of read correction. I'm thinking about the best way to implement these. My current thought is to use MEMs found in the GBWT to structure the error correction.

HaploKit commented 4 years ago

Hi Erik,

Thank you very much for this detailed information. It does resolve the problem I got before !

Best regards, Vincent

On Fri, Apr 3, 2020 at 4:46 PM Erik Garrison notifications@github.com wrote:

@vincentluo91 https://github.com/vincentluo91 I tried this:

minimap2 -cx asm20 -t 48 -X test.fastq test.fastq >test.paf seqwish -t 48 -s test.fastq -p test.1.paf -g test.gfa odgi build -g test.gfa -o - -p | odgi sort -p bSn -A -i - -t 48 -o test.odgi odgi viz -i test.odgi -o test.odgi.png -x 4000 -y 400 -P 1

[image: image] https://user-images.githubusercontent.com/145425/78372776-fc4cf680-75c9-11ea-898f-25cd6edf0ee4.png

This shows that a bit of the graph is covered by a lot of the reads, but there are a ton of "tips" (part to the right, mostly empty) where the read ends aren't aligning to each other.

Not sure if this matches what you'd expect, but I thought I'd share the process I would use initially.

In odgi or another related tool, I plan to work out assembly graph steps like tip pruning and bubble popping. They both amount to a kind of read correction. I'm thinking about the best way to implement these. My current thought is to use MEMs found in the GBWT to structure the error correction.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-608473870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD27XTSTZU5KRRHLUBEEYMLRKXZDZANCNFSM4LRJ4TJA .

HaploKit commented 4 years ago

Hi Erik,

I check the variation graph constructed by running seqwish on noisy long reads(read length ~10Kb ,sequencing error ~10%) using the parametes you suggested. As you also found, there are many tips in the graph which are not what I expected. For example, there are 65 reads (300 reads in all) only have one overlap with others, which is not correct(I checked using simulated reads). I also tried to use 'minimap2 -x ava-pb’ rather than 'minimap2 -cx asm20’ to get more overlaps, but the variation graph still seems not ‘correct’. I don’t understand why there are many self-loops in graph (i.e. 17927 self-loops in 217826 links). So I am a little confused, is seqwish suitable to construct variation graph from noisy long reads ?

I also tried vg-msga to do the same thing but it just too slow to construct graph and also memory-consuming(for example using hunderds of reads). Here is the command I used for testing: "vg msga -f test.fa -B 128 -K 11 -X 2 -E 3 -H 5 >test.vg"

I am a newbie about variation graphs. It would be greatly appreciated if you could give some suggestions. Many thanks in advance.

Best regards, Xiao

On Sat, Apr 4, 2020 at 10:47 AM Xiao Luo vincentluo91@gmail.com wrote:

Hi Erik,

Thank you very much for this detailed information. It does resolve the problem I got before !

Best regards, Vincent

On Fri, Apr 3, 2020 at 4:46 PM Erik Garrison notifications@github.com wrote:

@vincentluo91 https://github.com/vincentluo91 I tried this:

minimap2 -cx asm20 -t 48 -X test.fastq test.fastq >test.paf seqwish -t 48 -s test.fastq -p test.1.paf -g test.gfa odgi build -g test.gfa -o - -p | odgi sort -p bSn -A -i - -t 48 -o test.odgi odgi viz -i test.odgi -o test.odgi.png -x 4000 -y 400 -P 1

[image: image] https://user-images.githubusercontent.com/145425/78372776-fc4cf680-75c9-11ea-898f-25cd6edf0ee4.png

This shows that a bit of the graph is covered by a lot of the reads, but there are a ton of "tips" (part to the right, mostly empty) where the read ends aren't aligning to each other.

Not sure if this matches what you'd expect, but I thought I'd share the process I would use initially.

In odgi or another related tool, I plan to work out assembly graph steps like tip pruning and bubble popping. They both amount to a kind of read correction. I'm thinking about the best way to implement these. My current thought is to use MEMs found in the GBWT to structure the error correction.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-608473870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD27XTSTZU5KRRHLUBEEYMLRKXZDZANCNFSM4LRJ4TJA .

ekg commented 4 years ago

I am working on several improvements that should help to make graphs that are a more locally linear in cases like this.

I would suggest using two parameters, -k 8 and -r 1, to smooth the graph (-k 8) and precent self loops (-r 1). The latter isn't the best approach. In the case of an actual segmental duplication it can generate extremely strange motifs. But for your case it should be fine.

Other than this, seqwish is going to reproduce the input alignments exactly. If you are doing an all-vs-all alignment then you must specify -X for minimap2 to eliminate self mappings. This changes the behavior of the algorithm in some ways. It might be necessary to look at the minimap2 source to see exactly how.

Try this out. Then, the problem is probably downstream. We need to remove the tips and smooth the graph further by error correcting the reads against each other. This could be done first with another tool for read polishing, or it could be a new algorithm set up on top of the graph. I'm partial to the latter, but I don't have it written. So at the moment I would suggest applying a preprocessing step to polish the reads.

On Fri, Apr 17, 2020, 19:50 Vincent notifications@github.com wrote:

Hi Erik,

I check the variation graph constructed by running seqwish on noisy long reads(read length ~10Kb ,sequencing error ~10%) using the parametes you suggested. As you also found, there are many tips in the graph which are not what I expected. For example, there are 65 reads (300 reads in all) only have one overlap with others, which is not correct(I checked using simulated reads). I also tried to use 'minimap2 -x ava-pb’ rather than 'minimap2 -cx asm20’ to get more overlaps, but the variation graph still seems not ‘correct’. I don’t understand why there are many self-loops in graph (i.e. 17927 self-loops in 217826 links). So I am a little confused, is seqwish suitable to construct variation graph from noisy long reads ?

I also tried vg-msga to do the same thing but it just too slow to construct graph and also memory-consuming(for example using hunderds of reads). Here is the command I used for testing: "vg msga -f test.fa -B 128 -K 11 -X 2 -E 3 -H 5 >test.vg"

I am a newbie about variation graphs. It would be greatly appreciated if you could give some suggestions. Many thanks in advance.

Best regards, Xiao

On Sat, Apr 4, 2020 at 10:47 AM Xiao Luo vincentluo91@gmail.com wrote:

Hi Erik,

Thank you very much for this detailed information. It does resolve the problem I got before !

Best regards, Vincent

On Fri, Apr 3, 2020 at 4:46 PM Erik Garrison notifications@github.com wrote:

@vincentluo91 https://github.com/vincentluo91 I tried this:

minimap2 -cx asm20 -t 48 -X test.fastq test.fastq >test.paf seqwish -t 48 -s test.fastq -p test.1.paf -g test.gfa odgi build -g test.gfa -o - -p | odgi sort -p bSn -A -i - -t 48 -o test.odgi odgi viz -i test.odgi -o test.odgi.png -x 4000 -y 400 -P 1

[image: image] < https://user-images.githubusercontent.com/145425/78372776-fc4cf680-75c9-11ea-898f-25cd6edf0ee4.png

This shows that a bit of the graph is covered by a lot of the reads, but there are a ton of "tips" (part to the right, mostly empty) where the read ends aren't aligning to each other.

Not sure if this matches what you'd expect, but I thought I'd share the process I would use initially.

In odgi or another related tool, I plan to work out assembly graph steps like tip pruning and bubble popping. They both amount to a kind of read correction. I'm thinking about the best way to implement these. My current thought is to use MEMs found in the GBWT to structure the error correction.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-608473870, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD27XTSTZU5KRRHLUBEEYMLRKXZDZANCNFSM4LRJ4TJA

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-615380618, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQELGCRNKOARAGD4ZMBDRNCJGVANCNFSM4LRJ4TJA .

HaploKit commented 4 years ago

Hi Erik,

Many thanks for your suggestions!

I tried the parameters you suggested but the graph still looks not good. For polishing reads at first step, I guess it may over-correct the variations if there are multiple similar genomes. Anyway, good to know you are improving it. Thanks again.

Best regards, Xiao

On Sat, Apr 18, 2020 at 8:43 AM Erik Garrison notifications@github.com wrote:

I am working on several improvements that should help to make graphs that are a more locally linear in cases like this.

I would suggest using two parameters, -k 8 and -r 1, to smooth the graph (-k 8) and precent self loops (-r 1). The latter isn't the best approach. In the case of an actual segmental duplication it can generate extremely strange motifs. But for your case it should be fine.

Other than this, seqwish is going to reproduce the input alignments exactly. If you are doing an all-vs-all alignment then you must specify -X for minimap2 to eliminate self mappings. This changes the behavior of the algorithm in some ways. It might be necessary to look at the minimap2 source to see exactly how.

Try this out. Then, the problem is probably downstream. We need to remove the tips and smooth the graph further by error correcting the reads against each other. This could be done first with another tool for read polishing, or it could be a new algorithm set up on top of the graph. I'm partial to the latter, but I don't have it written. So at the moment I would suggest applying a preprocessing step to polish the reads.

On Fri, Apr 17, 2020, 19:50 Vincent notifications@github.com wrote:

Hi Erik,

I check the variation graph constructed by running seqwish on noisy long reads(read length ~10Kb ,sequencing error ~10%) using the parametes you suggested. As you also found, there are many tips in the graph which are not what I expected. For example, there are 65 reads (300 reads in all) only have one overlap with others, which is not correct(I checked using simulated reads). I also tried to use 'minimap2 -x ava-pb’ rather than 'minimap2 -cx asm20’ to get more overlaps, but the variation graph still seems not ‘correct’. I don’t understand why there are many self-loops in graph (i.e. 17927 self-loops in 217826 links). So I am a little confused, is seqwish suitable to construct variation graph from noisy long reads ?

I also tried vg-msga to do the same thing but it just too slow to construct graph and also memory-consuming(for example using hunderds of reads). Here is the command I used for testing: "vg msga -f test.fa -B 128 -K 11 -X 2 -E 3 -H 5 >test.vg"

I am a newbie about variation graphs. It would be greatly appreciated if you could give some suggestions. Many thanks in advance.

Best regards, Xiao

On Sat, Apr 4, 2020 at 10:47 AM Xiao Luo vincentluo91@gmail.com wrote:

Hi Erik,

Thank you very much for this detailed information. It does resolve the problem I got before !

Best regards, Vincent

On Fri, Apr 3, 2020 at 4:46 PM Erik Garrison <notifications@github.com

wrote:

@vincentluo91 https://github.com/vincentluo91 I tried this:

minimap2 -cx asm20 -t 48 -X test.fastq test.fastq >test.paf seqwish -t 48 -s test.fastq -p test.1.paf -g test.gfa odgi build -g test.gfa -o - -p | odgi sort -p bSn -A -i - -t 48 -o test.odgi odgi viz -i test.odgi -o test.odgi.png -x 4000 -y 400 -P 1

[image: image] <

https://user-images.githubusercontent.com/145425/78372776-fc4cf680-75c9-11ea-898f-25cd6edf0ee4.png

This shows that a bit of the graph is covered by a lot of the reads, but there are a ton of "tips" (part to the right, mostly empty) where the read ends aren't aligning to each other.

Not sure if this matches what you'd expect, but I thought I'd share the process I would use initially.

In odgi or another related tool, I plan to work out assembly graph steps like tip pruning and bubble popping. They both amount to a kind of read correction. I'm thinking about the best way to implement these. My current thought is to use MEMs found in the GBWT to structure the error correction.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-608473870, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AD27XTSTZU5KRRHLUBEEYMLRKXZDZANCNFSM4LRJ4TJA

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-615380618, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AABDQELGCRNKOARAGD4ZMBDRNCJGVANCNFSM4LRJ4TJA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ekg/seqwish/issues/36#issuecomment-615613855, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD27XTQX5VD7HLLLHKGRUYTRNFDYLANCNFSM4LRJ4TJA .