deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

Adapter trimming of fastq Hi-C raw reads #663

Closed Drosophilid closed 3 years ago

Drosophilid commented 3 years ago

Hi, very basic question! is adapter trimming of fastq Hi-C raw reads needed before to use Hi-Explorer mapping parameters? Thanks

joachimwolff commented 3 years ago

HiCExplorer itself does not support mapping of raw fastq reads, we recommend using mappers like BWA-Mem, Hisat2 or Bowtie2. Adapter trimming is always useful, but please investigate your raw data first with FASTQC if it is necessary. However, please take care that the order of the reads is not changed by the trimming software.

Best,

Joachim

Drosophilid commented 3 years ago

Thanks Joachim, Sorry what I meant Hi-Explorer mapping parameters actually are bwa-mem mapping options

bwa mem mapping options:

-A INT score for a sequence match, which scales options -TdBOELU unless overridden [1]

-B INT penalty for a mismatch [4] # -O INT[,INT] gap open penalties for deletions and insertions [6,6]

-E INT[,INT] gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1]

-L INT[,INT] penalty for 5'- and 3'-end clipping [5,5] # this is set to no penalty.

Thanks again, I ll check

gtrichard commented 3 years ago

Usual Hi-C mapping parameters with bwa-mem:

bwa mem -A1 -B4  -E50 -L0 -t 15 BWAIndex/genome.fa FASTQ/GLLR10.R1.fastq.gz  | samtools view -Shb - > BWA/GLLR10.R1.bam
bwa mem -A1 -B4  -E50 -L0 -t 15 BWAIndex/genome.fa FASTQ/GLLR10.R2.fastq.gz  | samtools view -Shb - > BWA/GLLR10.R2.bam

Then build a Hi-C matrix and get QCs with hicBuildMatrix

hicBuildMatrix -s BWA/GLLR10.R1.bam BWA/GLLR10.R2.bam -bs 1000 --restrictionSequence GATC --danglingSequence GATC --minDistance 150 --maxDistance 1000 --QCfolder HiC_matrices/QCplots/GLLR10_QC --threads 10  -o HiC_matrices/GLLR10_bs.h5

As Joachim pointed out, use FastQC and check adapters' content plots to see if you should use adapter trimming before mapping.