DavidsonGroup / flexiplex

The Flexible Demultiplexer
https://davidsongroup.github.io/flexiplex/
MIT License
23 stars 2 forks source link

Exclude ambiguous barcodes etc. #15

Closed ChangqingW closed 11 months ago

ChangqingW commented 1 year ago

Previous PR had a bug, fixed in commit 17393df606269432c5146458e0d4e922bd6b3f49 Exclude ambiguous barcodes 0c565c67137ede87d2700936aa05e90548249e5f use aligment for locating barcode instead of searching for primer again remove extra \t in flexiplex_reads_barcodes.txt 43a10f7e688161a2d6dd36364733d520345fc423

ChangqingW commented 1 year ago

Didn't know pushing more commits would update the PR as well! Made this branch that includes less commits, sorry for the previous mess

nadiadavidson commented 11 months ago

Hi Changqing, I've finally gotten around to testing this, however I run into a compilation issue (gcc 9.1.0), which I've pasted below. Ideally, we want flexiplex to be as easily to install as possible as this is one of its selling points (ie. should compile for any version of gcc and not depend on other libraries). I've also noticed that you've made a bunch of changes in your private repo for flexiplex, which diverge quite a bit and which we may not want to include in this repository. Let me know what you'd like to do. I think adding a test of whether the barcodes are ambiguous is a good idea and I can implement this separately even if this pull request isn't merged. Cheers, Nadia.

g++ flexiplex.c++ -Ofast -pthread -std=c++17 -o flexiplex edlib-1.2.7/edlib.cpp -I edlib-1.2.7/ flexiplex.c++: In function ‘Barcode get_barcode(std::string&, std::unordered_set<std::cxx11::basic_string >*, int, int)’: flexiplex.c++:190:100: error: no matching function for call to ‘inclusive_scan(std::vector::iterator, std::vector::iterator, std::vector::iterator)’ 190 | std::inclusive_scan(subpattern_lengths.begin(), subpattern_lengths.end(), subpattern_ends.begin()); | ^ In file included from /stornext/System/data/apps/gcc/gcc-9.1.0/include/c++/9.1.0/numeric:214, from flexiplex.c++:18: /stornext/System/data/apps/gcc/gcc-9.1.0/include/c++/9.1.0/pstl/glue_numeric_defs.h:66:1: note: candidate: ‘template<class _ExecutionPolicy, class _ForwardIterator1, class _ForwardIterator2> pstl::internal::enable_if_execution_policy<_ExecutionPolicy, _ForwardIterator2> std::inclusive_scan(_ExecutionPolicy&&, _ForwardIterator1, _ForwardIterator1, _ForwardIterator2)’ 66 | inclusive_scan(_ExecutionPolicy&& exec, _ForwardIterator1 first, _ForwardIterator1 last, | ^~~~~~ /stornext/System/data/apps/gcc/gcc-9.1.0/include/c++/9.1.0/pstl/glue_numeric_defs.h:66:1: note: template argument deduction/substitution failed: flexiplex.c++:190:100: note: candidate expects 4 arguments, 3 provided 190 | std::inclusive_scan(subpattern_lengths.begin(), subpattern_lengths.end(), subpattern_ends.begin()); | ^ In file included from /stornext/System/data/apps/gcc/gcc-9.1.0/include/c++/9.1.0/numeric:214, from flexiplex.c++:18: /stornext/System/data/apps/gcc/gcc-9.1.0/include/c++/9.1.0/pstl/glue_numeric_defs.h:71:1: note: candidate: ‘template<class _ExecutionPolicy, class _ForwardIterator1, class _ForwardIterator2, class _BinaryOperation> pstl::internal::enable_if_execution_policy<_ExecutionPolicy, _ForwardIterator2> std::inclusive_scan(_ExecutionPolicy&&, _ForwardIterator1, _ForwardIterator1, _ForwardIterator2, _BinaryOperation)’ 71 | inclusive_scan(_ExecutionPolicy&& exec, _ForwardIterator1 __first, _ForwardIterator1 last, | ^~~~~~ /stornext/System/data/apps/gcc/gcc-9.1.0/include/c++/9.1.0/pstl/glue_numeric_defs.h:71:1: note: template argument deduction/substitution failed: flexiplex.c++:190:100: note: candidate expects 5 arguments, 3 provided 190 | std::inclusive_scan(subpattern_lengths.begin(), subpattern_lengths.end(), subpattern_ends.begin()); | ^ In file included from /stornext/System/data/apps/gcc/gcc-9.1.0/include/c++/9.1.0/numeric:214, from flexiplex.c++:18: /stornext/System/data/apps/gcc/gcc-9.1.0/include/c++/9.1.0/pstl/glue_numeric_defs.h:76:1: note: candidate: ‘template<class _ExecutionPolicy, class _ForwardIterator1, class _ForwardIterator2, class _Tp, class _BinaryOperation> pstl::internal::enable_if_execution_policy<_ExecutionPolicy, _ForwardIterator2> std::inclusive_scan(_ExecutionPolicy&&, _ForwardIterator1, _ForwardIterator1, _ForwardIterator2, _BinaryOperation, _Tp)’ 76 | inclusive_scan(_ExecutionPolicy&& exec, _ForwardIterator1 __first, _ForwardIterator1 __last, | ^~~~~~ /stornext/System/data/apps/gcc/gcc-9.1.0/include/c++/9.1.0/pstl/glue_numeric_defs.h:76:1: note: template argument deduction/substitution failed: flexiplex.c++:190:100: note: candidate expects 6 arguments, 3 provided 190 | std::inclusive_scan(subpattern_lengths.begin(), subpattern_lengths.end(), subpattern_ends.begin()); | ^ make: *** [flexiplex] Error 1

ChangqingW commented 11 months ago

Replaced the function missing in older GCCs, should compile on GCC >= 6.3.0

ChangqingW commented 11 months ago

I wonder if you think trimming the TSO sequence would be a useful feature to add to flexiplex, then we will be able to add a "found TSO" column in the stats file. I am currently using cutadapt to do this but I image this could be done in flexiplex with another edlibAlign call with SHW mode. The issue is we only get the optimal match with edlib and cannot get the first match for TSO.

nadiadavidson commented 11 months ago

Thanks, I'll give the update for lower gcc a go.

About the TSO, if the protocol is 5' you can just chance flexiplex to search for it instead of the polyT. e.g: flexiplex -r “TTTCTTATATGGG” -k file.fastq If the protocol is 3', you could run flexiplex twice, searching once for the barcode region and a second for the TSO. e.g something like: flexiplex -k file.fastq | flexiplex -r “TTTCTTA” -k "TATGGG" -l "" -u 0 but you'd probably want to make the strings longer and play around a bit with the option to check it's cutting where you expect. But cutadapter probably works just as well for this.

nadiadavidson commented 11 months ago

All looks good. With that change, you can revert back to -std=c++11, then it will compile with gcc 4 too. Once you've done that I will merge, fix the extra tab in the output table and bump the version.

nadiadavidson commented 11 months ago

Thanks Changqing, merging now.