epigen / atacseq_pipeline

Ultimate ATAC-seq Data Processing, Quantification and Annotation Snakemake Workflow and MrBiomics Module.
https://epigen.github.io/atacseq_pipeline/
MIT License
44 stars 2 forks source link

address adapter “confusion” #3

Open sreichl opened 1 year ago

sreichl commented 1 year ago
sreichl commented 11 months ago

ATAC-seq: Nextera adapter explanation by FD

Let's just focus on the color code since the orientation of the pieces is quite confusing. I assume you know the steps of adding Nextera adapter with transposase followed by a PCR to align and amplify the adapter with barcode and sequencing primer information.

The Nextera sequence for trimming is given as*:

Nextera_transposase_adapter_trimming

CTGTCTCTTATACACATCT

Nextera_transposase_adapter_trimming_reverse_complement

AGATGTGTATAAGAGACAG

What confused me was that the trimming sequence (gray) is only a substring (underlined) of the adapter sequence for PCR amplification and indexing. I was looking for the whole trimming sequence and could not find it in the adapter:

Adapter_sequence

CAAGCAGAAGACGGCATACGAGATTCGCCTTAGTCTCGTGGGCTCGGAGATGT

Index 1 PCR primer read

Index (variable)

Transposase adapter specific (Part 1)

Transposase adapter specific (Part 2)

The missing piece of information was that the Nextera transposase adapter for trimming is not the complete transposable sequence that gets aligned by the Tn5 transposase. The complete sequence looks like this:

Nextera_transposase_adapter

TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG

Nextera_transposase_adapter_reverse_complement

GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG

In conclusion, the Nextera trimming sequence is only a substring of the whole transposable DNA element which gets aligned to the DNA fragments by transposase. What I don't understand is why Illumina only uses the substring for trimming.

*https://support-docs.illumina.com/SHARE/AdapterSeq/illumina-adapter-sequences.pdf

PS: Transposases are such an interesting class of enzymes. One of my favorite papers during my Master's was about viral transposable elements in the human genome which get shuffled around when the epigenetic marks for repression are lifted during embryogenesis (https://www.nature.com/articles/nrg2072) .

sreichl commented 11 months ago

ATACseq_MultiQC_metric_diff.xlsx