BIMSBbioinfo / pigx_scrnaseq

Pipeline for analysis of Dropseq single cell data
http://bioinformatics.mdc-berlin.de/pigx
10 stars 6 forks source link

Develop indrops parser #20

Open frenkiboy opened 6 years ago

frenkiboy commented 6 years ago

Develop parsers for different single cell technologies have different cell and umi adaptor locations in R1. A parser should be able to convert this kind of data into drop-seq like adapters.

Indrop uses variable length adapters - adapter specification: indrop UMI description: sequence W1 adapter: AAGGCGTCACAAGCAATCACTC

B1 anatomy: BBBBBBBB[BBB]WWWWWWWWWWWWWWWWWWWWWWCCCCCCCCUUUUUUTTTTTTTTTT__ B = Barcode1, can be 8, 9, 10 or 11 bases long. W = 'W1' sequence, specified below C = Barcode2, always 8 bases U = UMI, always 6 bases T = Beginning of polyT tail. _ = Either sequencing survives across the polyT tail, or signal starts dropping off (and start being anything, likely with poor quality) minimal_polyT_len_on_R1 = 7 hamming_threshold_for_W1_matching = 3 w1 = "GAGTGATTGCTTGTGACGCCTT" rev_w1 = "AAGGCGTCACAAGCAATCACTC" #Hard-code so we don't recompute on every one of millions of calls