bonsai-team / Porechop_ABI

Adapter trimmer for Oxford Nanopore reads using ab initio method
GNU General Public License v3.0
35 stars 4 forks source link

Assertion failed : length(seqH) >= 1u was: 0 < 1 #3

Closed DelphIONe closed 5 years ago

DelphIONe commented 5 years ago

I run this command porechop -abi -i test_chia.fastq -b test_ch --discard_middle and I obtain

44,000 / 44,000 (100.0%)

41,578 / 44,000 reads had adapters trimmed from their start (1,989,870 bp removed) 36,169 / 44,000 reads had adapters trimmed from their end (1,879,168 bp removed)

Discarding reads containing middle adapters 180 / 44,000 (0.4%)porechop/include/seqan/align/dp_setup.h:283 Assertion failed : length(seqH) >= 1u was: 0 < 1

stack trace: 0 [0x7f6293f24071] seqan::ClassTest::fail() + 0xe 1 [0x7f6293f2bdce] seqan::Value<seqan::Score<int, seqan::Tag >, 0>::Type seqan::setUpAndRunAlignment<int, seqan::Tag<seqan::AffineGaps>, unsigned char, seqan::String<seqan::DPCell<int, seqan::Tag<seqan::AffineGaps> >, seqan::Alloc >, seqan::String<unsigned char, seqan::Alloc >, seqan::TraceSegment<unsigned long, unsigned long>, seqan::Alloc, seqan::Tag<seqan::Default>, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc >, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc >, int, seqan::Tag, seqan::GlobalAlignment<seqan::FreeEndGaps<seqan::False, seqan::False, seqan::False, seqan::False> >, seqan::DPBandConfig<seqan::Tag >, seqan::FreeEndGaps<seqan::True, seqan::True, seqan::True, seqan::True>, seqan::TracebackOn<seqan::TracebackConfig<seqan::Tag, seqan::Tag > > >(seqan::DPContext<seqan::DPCell<int, seqan::Tag<seqan::AffineGaps> >, unsigned char, seqan::String<seqan::DPCell<int, seqan::Tag<seqan::AffineGaps> >, seqan::Alloc >, seqan::String<unsigned char, seqan::Alloc > >&, seqan::String<seqan::TraceSegment<unsigned long, unsigned long>, seqan::Alloc >&, seqan::DPScoutState<seqan::Tag >&, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc > const&, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc > const&, seqan::Score<int, seqan::Tag > const&, seqan::AlignConfig2<seqan::GlobalAlignment<seqan::FreeEndGaps<seqan::False, seqan::False, seqan::False, seqan::False> >, seqan::DPBandConfig<seqan::Tag >, seqan::FreeEndGaps<seqan::True, seqan::True, seqan::True, seqan::True>, seqan::TracebackOn<seqan::TracebackConfig<seqan::Tag, seqan::Tag > > > const&) + 0x87 2 [0x7f6293f2a150] seqan::Value<seqan::Score<int, seqan::Tag >, 0>::Type seqan::setUpAndRunAlignment<seqan::TraceSegment<unsigned long, unsigned long>, seqan::Alloc, seqan::Tag, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc >, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc >, int, seqan::Tag, seqan::GlobalAlignment<seqan::FreeEndGaps<seqan::False, seqan::False, seqan::False, seqan::False> >, seqan::DPBandConfig<seqan::Tag >, seqan::FreeEndGaps<seqan::True, seqan::True, seqan::True, seqan::True>, seqan::TracebackOn<seqan::TracebackConfig<seqan::Tag, seqan::Tag > >, seqan::Tag >(seqan::String<seqan::TraceSegment<unsigned long, unsigned long>, seqan::Alloc >&, seqan::DPScoutState<seqan::Tag >&, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc > const&, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc > const&, seqan::Score<int, seqan::Tag > const&, seqan::AlignConfig2<seqan::GlobalAlignment<seqan::FreeEndGaps<seqan::False, seqan::False, seqan::False, seqan::False> >, seqan::DPBandConfig<seqan::Tag >, seqan::FreeEndGaps<seqan::True, seqan::True, seqan::True, seqan::True>, seqan::TracebackOn<seqan::TracebackConfig<seqan::Tag, seqan::Tag > > > const&, seqan::Tag const&) + 0x77 3 [0x7f6293f291ab] int seqan::globalAlignment<seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc >, seqan::Tag<seqan::ArrayGaps>, int, seqan::Tag, true, true, true, true, seqan::Tag, seqan::Tag >(seqan::Align<seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc >, seqan::Tag<seqan::ArrayGaps> >&, seqan::Score<int, seqan::Tag > const&, seqan::AlignConfig<true, true, true, true, seqan::Tag > const&, seqan::Tag const&) + 0xa5 4 [0x7f6293f281f8] int seqan::globalAlignment<seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc >, seqan::Tag<seqan::ArrayGaps>, int, seqan::Tag, true, true, true, true, seqan::Tag >(seqan::Align<seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5>, seqan::Alloc >, seqan::Tag<seqan::ArrayGaps> >&, seqan::Score<int, seqan::Tag > const&, seqan::AlignConfig<true, true, true, true, seqan::Tag > const&) + 0x79 5 [0x7f6293f275e0] adapterAlignment + 0x1bb 6 [0x7f6294173e20] ffi_call_unix64 + 0x4c 7 [0x7f629417388b] ffi_call + 0x2eb 8 [0x7f629416e01a] _ctypes_callproc + 0x49a 9 [0x7f6294161fcb] /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so(+0x9fcb) 10 [0x5c20e7] PyObject_Call + 0x47 11 [0x53b656] PyEval_EvalFrameEx + 0x4ed6 12 [0x53b294] PyEval_EvalFrameEx + 0x4b14 13 [0x53b294] PyEval_EvalFrameEx + 0x4b14 14 [0x53b294] PyEval_EvalFrameEx + 0x4b14 15 [0x540b0b] PyEval_EvalCodeEx + 0x13b 16 [0x4ec3f7] /usr/bin/python3.5() 17 [0x5c20e7] PyObject_Call + 0x47 18 [0x538cab] PyEval_EvalFrameEx + 0x252b

Abandon (core dumped)

Can you help me please ? Thanks,

qbonenfant commented 5 years ago

Hi, I manged to recreate your problem, it seems that some of your read are too short after trimming, and no middle sequence is left. Your dataset must contains some fairly short sequences, which is natively not well supported by the current Porechop version.

The only solution that come to my mind right now is to filter your input dataset to remove reads with length <= 300 ( 2x the current porechop sampling window, if you want to be on the safe side, but 2 times the inferred adapter length should be enough) which may end up being too short anyway. Depending on your application you may prefer to adjust the sampling window by using --end_size XX to a more forgiving size (90 / 100 for example).

I hope this may help you, keep us informed if it did.

Quentin Bonenfant

DelphIONe commented 5 years ago

Hi, Thanks for your reply. The Porechop version (not abi version) works fine on the same file. Moreover, I have the same result if I launch porechop abi without discard_middle option.

qbonenfant commented 5 years ago

Hi, Which version of Porechop (non ABI) are you using ? Some behaviors may differ from current version. Our contribution only add a step right after reading the adapter database, and should not interfer with native code. Also, removing --discard_middle do not skip the middle adapter ressearch, so it won't change the outcome in this case. If you want to try without middle adapter ressearch, try.--no_split instead.

Quentin

DelphIONe commented 5 years ago

I use Porechop 0.2.4 The command porechop (abi) with no_split option works fine! I don't know if it help you ?

qbonenfant commented 5 years ago

Hi,

Since we do use the same version of Porechop, the thing i can think of is that the adapter inference finds a better (or at least fits the reads slightly better) adapter than the ones in the database, and some reads that were not trimmed before now are.

If using --no_split do correct your problem, that confirm my previous hypothesis about your reads, which is that some of them are too short and completely deleted after ends trimming.

You can try it yourself on a subset of very short reads using -v 2 so you will see what porechop is really trimming and what is left after that. You will see that in some cases, porechop will delete the whole sequence if it thinks it is only composed of adapters, leading to the error you encounter. If times allows it, i will try to patch this in the future.

Unless you really need to keep very short reads, i do recommend that you filter them from your fasta / fastq files if you are sure your dataset is not trimmed.

DelphIONe commented 5 years ago

Ok, thanks. One question out of subject but... do you have tested your tool on amplicon run ? I'm wondering to one over represented sequence can be confused with adapter. Do you see what I mean ?

qbonenfant commented 5 years ago

We did tried our tool on amplicon run, and yes we do face this problem. We are still investigating if it is possible to separate adapter from the sequences on those run. We currently need to collect more datasets to extend our knowledge on this kind of data.

I hope i answered all of your questions.

Quentin

DelphIONe commented 5 years ago

Thanks!