Closed ashokpatowary closed 3 months ago
Hi, For these instances I usually run flexiplex twice (piping the output from one barcode search into the input of another. e.g.: flexiplex -x [flank1] -b ????????? -x [half of flank2] | flexiplex -x [other half of flank2] -u ???????? -b ?????????? -x TTTTTTTTTTTTT > result.fastq
Then the read IDs will look like "barcode1_UMI#barcode2_UMI#orginal_read_ID", which may not be compatible with FLAMES, but could be converted to something which is through standard bash tools like sed and cut. @ChangqingW may be able to comment on what would be compatible with FLAMES.
Cheers, Nadia.
Yeah I think Nadia's suggestion is probably the most practical solution now with flexiplex. If you have significant amount of chimeric reads that might complicate things though, you could get more than 2 reads at the end from 1 chimeric read. Your protocol looks very interesting, is this published yet?
Yeah I think Nadia's suggestion is probably the most practical solution now with flexiplex. If you have significant amount of chimeric reads that might complicate things though, you could get more than 2 reads at the end from 1 chimeric read. Your protocol looks very interesting, is this published yet?
@ChangqingW What is the read ID format that FLAMES expects? Is that documented anywhere?
It expects @BC_UMI#Anything
as outputted by flexiplex, everything after the first #
will be ignored.
I'll add this to FLAMES' documentation.
Hi @nadiadavidson and @ChangqingW
Thanks for the suggestion. Its ScaleBio technology which I am trying to adapt for long read sequencing. Identifying the barcode is easy if using PacBio plateform; but with ONT its little tricky; but I think flexiplex can handle it. Thanks for the FLAMES suggestion; I think I can modify it with awk.
@nadiadavidson I have another follow up question. If i try running flexiplex with -u ???????? -b ?????????? and -k with barcode file (barcode sequences 10bp) it throws the following error; but if I reduce the "-b" wild character length to 7 "?" with barcode files having 10bp barcodes; it works. Any suggestion whats going on
Setting max barcode edit distance to 2
Setting number of threads to 24
For usage information type: flexiplex -h
No filename given... getting reads from stdin...
Searching for barcodes...
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
Aborted
Thanks @ashokpatowary that's interesting to know. If you get it working, it would be great to add this use case to our documentation and/or presents.
I've had a similar error in the past when processing ParseBio data and I believe it was due to truncated reads where the barcode was partially cut-off. I got around this by adding some "buffer" sequence to the end of each read and then adding that back into the flanking search sequence (so it was trimmed). This was not ideal and I had meant to post a github issue here about it. If you have a small toy dataset which reproduces the issue, we can take a look in more detail.
Cheers, Nadia.
Also pinging @olliecheng about this.
Hi, I've tried flexiplex on ParseBio datasets which have similar barcoding structure. It will be great if flexiplex -i true
can split chimeric reads and add barcode sequence to read ID without trimming sequences, so that the full flanking sequence can be used in both flexiplex runs. Perhaps sequence removal can be added as a separate option so that we can invoke it only in the last flexiplex run:
flexiplex --trim FALSE -x [flank1] -b ????????? -x [full flank2] | flexiplex --trim TRUE -x [full flank2] -u ???????? -b ?????????? -x TTTTTTTTTTTTT
Thanks @nadiadavidson and @yxsee.
I think I can get it done by using the following; however since upstream flanking sequence is not specified there is a chance of having false positive
flexiplex-linux -k lig_seq.txt -n Lig -x CTACACGACGCTCTTCCGATCT -b ????????? -x TCAGAGC -u ???????? -f 2 -e 2 test.fastq -p 24 | flexiplex-linux -n rt -x '' -b ?????????? -x TTTTTTTTTT -f 2 -e 2 -k rt_seq.txt
I thereafter tried sed "/[@,+]/! s/^/START/g" | flexiplex-linux -n test_rt -x START
; it through same error. Thereafter I tried sed "/[@,+]/! s/^/START/g| flexiplex-linux -n rt -x '' -b ?????????? -x TTTTTTTTTT -f 2 -e 2 -k rt_seq.txt
same error because I introduce 4 character at the stat of the sequence. To check that 2nd barcode is not causing any trouble I ran flexiplex-linux -n test -x TCAGAGC -u ???????? -b ?????????? -x TTTTTTTTTT -f 2 -e 2 -k rt_seq.tx test.fastq
that works fine identifying the 2nd barcodes. I am not sure what causing the issue if I run it two times. I will happy to share a test file.
Thanks
@ashokpatowary Could you check if my branch fixes the out_of_range error? The handling of truncated UMIs is still inconsistent at the moment but should resolve the error during UMI extraction.
Hi @ChangqingW; unfortunately the branch through the same error
Searching for barcodes...
0.1 million reads processed..
0.2 million reads processed..
0.3 million reads processed..
0.4 million reads processed..
0.5 million reads processed..
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
Aborted
Hi @ChangqingW; unfortunately the branch through the same error
Searching for barcodes... 0.1 million reads processed.. 0.2 million reads processed.. 0.3 million reads processed.. 0.4 million reads processed.. 0.5 million reads processed.. terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr Aborted
Can you use the flexiplex-linux binary from my branch, do
ulimit -c unlimited
before running flexiplex, and share the core.xxx dump file? It should be pretty small and can be uploaded to the issue comment.
@ChangqingW @ashokpatowary Thanks for your feedback and bug report. I’ve moved the discussion to a separate issue (#43) so it’s easier to find and track; I’ll close this issue now. If you have any more discussion relevant to the original issue, feel free to reopen. Cheers!
Will it be possible to use flexiplex to demultiplex reads with the following patter where we will have two different barcode files. We want thee output compatible with FLAMES
-x [flank1] -b ????????? -x [flank2] -u ???????? -b ?????????? -x TTTTTTTTTTTTT -f 8 -e 2
Thanks