LangilleLab / microbiome_helper

A repository of bioinformatic scripts, SOPs, and tutorials for analyzing microbiome data.
GNU General Public License v3.0
430 stars 205 forks source link

WIP: DADA2 scripts #13

Closed gavinmdouglas closed 6 years ago

gavinmdouglas commented 6 years ago

Main scripts to be used for new 16S workflow based on dada2 (and their tutorial available here).

The 6 added scripts are:

Relevant commands

cutadapt can be run in parallel with a command like the one below (after creating the primer_trimmed_fastqs folder). parse_cutadapt_logs.py is expecting output logfiles in primer_trimmed_fastqs in this case.

parallel --eta --link --jobs 60 --noswap \
  'cutadapt \
    --pair-filter any \
    --no-indels \
    --discard-untrimmed \
    -g CCTACGGGNGGCWGCAG \
    -G GACTACHVGGGTATCTAATCC \
    -o primer_trimmed_fastqs/{1/.}.gz \
    -p primer_trimmed_fastqs/{2/.}.gz \
    {1} {2} \
    > primer_trimmed_fastqs/{1/.}_cutadapt_log.txt' \
  ::: ../raw_fastqs/*_R1_*.fastq.gz ::: ../raw_fastqs/*_R2_*.fastq.gz

Once an output BIOM table is created with an observation metadata file they can be combined with the below command (this requires the latest version of biom-format: 2.1.6.

biom add-metadata -i seqtab.biom -o seqtab_tax.biom --observation-metadata-fp taxa_metadata.txt