WIP: DADA2 scripts - Githubissues

Main scripts to be used for new 16S workflow based on dada2 (and their tutorial available here).

The 6 added scripts are:

parse_cutadapt_logs.py - used for combining cutadapt logfiles to create 1 count table (see below).
dada2_filter.R - performs filtering steps of dada2 big data tutorial.
dada2_inference.R - main dada2 algorithm, infers variants.
dada2_chimera_taxa.R - performs dada2 denovo bimera checking and assigns taxonomy.
merge_logfiles.R - merge any tables based on overlapping ids in the first column and the same delimited format (e.g. tab-delimited). Intended to be used to combine the logfiles produced by the 4 above scripts.
convert_dada2_out.R - script to convert dada2 sequence table to BIOM and a fasta file. Can also optionally read in dada2 assigned taxonomies and output them in tab-delimited format to be added to a BIOM file with the biom add-metadata command.

Relevant commands

cutadapt can be run in parallel with a command like the one below (after creating the primer_trimmed_fastqs folder). parse_cutadapt_logs.py is expecting output logfiles in primer_trimmed_fastqs in this case.

parallel --eta --link --jobs 60 --noswap \
  'cutadapt \
    --pair-filter any \
    --no-indels \
    --discard-untrimmed \
    -g CCTACGGGNGGCWGCAG \
    -G GACTACHVGGGTATCTAATCC \
    -o primer_trimmed_fastqs/{1/.}.gz \
    -p primer_trimmed_fastqs/{2/.}.gz \
    {1} {2} \
    > primer_trimmed_fastqs/{1/.}_cutadapt_log.txt' \
  ::: ../raw_fastqs/*_R1_*.fastq.gz ::: ../raw_fastqs/*_R2_*.fastq.gz

Once an output BIOM table is created with an observation metadata file they can be combined with the below command (this requires the latest version of biom-format: 2.1.6.

biom add-metadata -i seqtab.biom -o seqtab_tax.biom --observation-metadata-fp taxa_metadata.txt

LangilleLab / microbiome_helper

WIP: DADA2 scripts #13

Relevant commands