Develop parsers for different single cell technologies have different cell and umi adaptor locations in R1.
A parser should be able to convert this kind of data into drop-seq like adapters.
B1 anatomy: BBBBBBBB[BBB]WWWWWWWWWWWWWWWWWWWWWWCCCCCCCCUUUUUUTTTTTTTTTT__
B = Barcode1, can be 8, 9, 10 or 11 bases long.
W = 'W1' sequence, specified below
C = Barcode2, always 8 bases
U = UMI, always 6 bases
T = Beginning of polyT tail.
_ = Either sequencing survives across the polyT tail, or signal starts dropping off
(and start being anything, likely with poor quality)
minimal_polyT_len_on_R1 = 7
hamming_threshold_for_W1_matching = 3
w1 = "GAGTGATTGCTTGTGACGCCTT"
rev_w1 = "AAGGCGTCACAAGCAATCACTC" #Hard-code so we don't recompute on every one of millions of calls
Develop parsers for different single cell technologies have different cell and umi adaptor locations in R1. A parser should be able to convert this kind of data into drop-seq like adapters.
Indrop uses variable length adapters - adapter specification: indrop UMI description: sequence W1 adapter: AAGGCGTCACAAGCAATCACTC
B1 anatomy: BBBBBBBB[BBB]WWWWWWWWWWWWWWWWWWWWWWCCCCCCCCUUUUUUTTTTTTTTTT__ B = Barcode1, can be 8, 9, 10 or 11 bases long. W = 'W1' sequence, specified below C = Barcode2, always 8 bases U = UMI, always 6 bases T = Beginning of polyT tail. _ = Either sequencing survives across the polyT tail, or signal starts dropping off (and start being anything, likely with poor quality) minimal_polyT_len_on_R1 = 7 hamming_threshold_for_W1_matching = 3 w1 = "GAGTGATTGCTTGTGACGCCTT" rev_w1 = "AAGGCGTCACAAGCAATCACTC" #Hard-code so we don't recompute on every one of millions of calls