bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
993 stars 354 forks source link

Using UMIs in the bcbio smallRNA pipeline #2347

Closed mxhp75 closed 5 years ago

mxhp75 commented 6 years ago

Hi,

This is somewhat similar to #2070. We have sing end .fastq files with the following format:

@NB500965:105:HC5J5BGX2:1:11108:16467:3587 1:N:0:ATCACG TTCAAGTAATCCAGGATAGGAACTGTAGGCACCATCAATGACACCGAACGTAGATCGGAAAGCACACGTCTGAACT + AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAAEEE/EE

where the bolded ATCACG = unique sample index and the bolded AACTGTAGGCACCATCAAT = 3' adapter

Following the 3' adapter is a 12 nt UMI. If I massage the .fastq file such that they are in the format:

@NB500965:105:HC5J5BGX2:1:11108:16467:3587 1:N:0:ATCACG:UMI_GACACCGAACGTAGA
TTCAAGTAATCCAGGATAGGAACTGTAGGCACCATCAATGACACCGAACGTAGATCGGAAAGCACACGTCTGAACT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEAAEEE/EE

am I then able to add umi_type: fastq_name to the bcbio .yaml config and run through the small RNA pipeline? Is there a better way of doing this?

All advice gratefully received.

roryk commented 5 years ago

Looks like this got resolved, feel free to reopen if not!