Possibility of shorter i1 indices (for non-Lexogen pipelines)

Caffenicotiak commented 7 months ago

Hey, This seems like a very nice tool.

I wanted to use it for demultiplexing a paired-end MiSeq run (ITS2 amplicon sequencing) where we used custom 8-nt CDIs as i5 and i7 and an additional 4-nt barcode (just two different ones) just downstream of the insert (basically the default position in this package) to double the library number which could be pooled.

Installation and set-up of idemuxCPP worked great with conda (although, I ran into the problem with the messed up location of the index files with conda as described and solved in the closed issue).

But the package only accepts 6, 8, 10, 12-nt i1 sequences.

I know that this tool's main purpose is for Lexogen workflows but if it should be easy to implement custom index lengths, that would be amazing. At least I haven't found any other tool for NGS data to combine indices from index reads and inline sequences for demultiplexing.

Thanks, Chris

entzian commented 7 months ago

Hi Chris, thanks for the feedback! I will put this on my todo list.

Best regards, Gregor

entzian commented 6 months ago

Hi Chris, actually it is already possible to use codes of arbitrary length. If it is not a lexogen barcode, it will not be error corrected but it will be demultiplexed. You can specify the read (1 or 2) and the position either in the sample definition sheet or as command line parameteres: --i1-start=1 (one based start position within the read sequence), --i1-read=<1 or 2>. If all the codes within one column is a Lexogen code, it will correct what it can correct. However, you can actively turn off error correction (--demux-only) and use codes of arbitrary length.

Best regards, Gregor

Lexogen-Tools / idemuxcpp

Possibility of shorter i1 indices (for non-Lexogen pipelines) #2