Lexogen-Tools / idemuxcpp

iDemux is an all-in-one command line tool that can be used for both demultiplexing and error correction of FASTQ files. It enables demultiplexing of i1 inline barcodes of Lexogen’s QuantSeq-Pool as well as demultiplexing of i7 and/or i5 indices of any other RNA-Seq library prep. iDemux can also be used for superior error correction of RNA-Seq libraries generated with Lexogen’s UDI 12 nt Unique Dual Indices. This C++ version is faster than the Python version but requires a certain proficiency in command line tool handling.
https://www.lexogen.com/indexing/12nt-dual-indexing-kits/
Other
7 stars 1 forks source link

Possibility of shorter i1 indices (for non-Lexogen pipelines) #2

Closed Caffenicotiak closed 6 months ago

Caffenicotiak commented 7 months ago

Hey, This seems like a very nice tool.

I wanted to use it for demultiplexing a paired-end MiSeq run (ITS2 amplicon sequencing) where we used custom 8-nt CDIs as i5 and i7 and an additional 4-nt barcode (just two different ones) just downstream of the insert (basically the default position in this package) to double the library number which could be pooled.

Installation and set-up of idemuxCPP worked great with conda (although, I ran into the problem with the messed up location of the index files with conda as described and solved in the closed issue).

But the package only accepts 6, 8, 10, 12-nt i1 sequences.

I know that this tool's main purpose is for Lexogen workflows but if it should be easy to implement custom index lengths, that would be amazing. At least I haven't found any other tool for NGS data to combine indices from index reads and inline sequences for demultiplexing.

Thanks, Chris

entzian commented 7 months ago

Hi Chris, thanks for the feedback! I will put this on my todo list.

Best regards, Gregor

entzian commented 6 months ago

Hi Chris, actually it is already possible to use codes of arbitrary length. If it is not a lexogen barcode, it will not be error corrected but it will be demultiplexed. You can specify the read (1 or 2) and the position either in the sample definition sheet or as command line parameteres: --i1-start=1 (one based start position within the read sequence), --i1-read=<1 or 2>. If all the codes within one column is a Lexogen code, it will correct what it can correct. However, you can actively turn off error correction (--demux-only) and use codes of arbitrary length.

Best regards, Gregor