BauerLab / ngsane

Analysis Framework for Biological Data from High Throughput Experiments
Other
34 stars 22 forks source link

How to deal with Hi-C-seq datasets with three replicates? #49

Closed xflicsu closed 9 years ago

xflicsu commented 9 years ago

Hello, I use ngsane to process Hi-C-seq datasets with two replicates. But the interaction paires are less than total of indivadule. So, could help me figure out some error or provide config file to deal with several replicates.

Thanks!

Basic information as follows:

data directory


0.5.1.0/smokebox/fastq/HiC HiC/ |-- GMall_HindIII2_R1.fastq.gz |-- GMall_HindIII2_R2.fastq.gz |-- GMall_HindIII3_R1.fastq.gz |-- GMall_HindIII3_R2.fastq.gz |-- GMall_HindIII1_R1.fastq.gz |-- GMall_HindIII1_R2.fastq.gz


========================================================= confige file


author: Fabian Buske

date: Dec 2013

######################################

Resources

#####################################

****

Tasks

****

RUNHICUP="" # map HiC data with hicup RUNFITHIC="" # call significant chromatin interactions

****

Paths

****

SOURCE=$(pwd)

which folder to run on (i.e. folders within fastq directory)

declare -a DIR; DIR=( HiC )

folder/task containing the data this task operates on

INPUT_HICLIB="fastq" INPUT_HICUP="fastq" INPUT_FITHIC=$TASK_HICUP

where to write the output

OUT=$SOURCE

where to write the log files

QOUT=$OUT/qout

Summary file name

HTMLOUT="Summary"

****

PARAMETER (mandatory)

****

fastq file suffix

FASTQ="fastq.gz"

read indicator immediately preceding the fastq file suffix

READONE="_R1" READTWO="_R2"

reference as single chromosomes

FASTA_CHROMDIR=$(pwd)/referenceData/

bowtie v2.0 index including basename

FASTA=$(pwd)/referenceData/mm9_HiCcopy.fasta

Suffix of inout files to look for

INPUT_FITHIC_SUFFIX="$ASD.bam"

genome assembly

e.g. hg19

REFERENCE_NAME="mm9"

HICLIB_GAPFILE=$(pwd)/referenceData/mm9.gap.txt

Enzymes, see http://biopython.org/DIST/docs/api/Bio.Restriction-module.html

HICLIB_RENZYMES="HindIII"

restriction enzymes

HICUP_RENZYME1="HindIII" HICUP_RCUTSITE1="A^AGCTT" HICUP_RENZYME2= HICUP_RCUTSITE2=

****

PARAMETER (optional overwriting defaults)

****

HICUPMAPPER_ADDPARAM="--format Sanger"

HICLIB_READLENGTH= HICLIB_CHROMOSOME=16

uncomment to keep intermediate bam files from iterative mapping

HICLIB_KEEPBAM=1

HICORRECTOR_MAXITER=100

FITHICADDPARAM="--lowerbound 20000 --upperbound 5000000 " HIC_RESOLUTION="40000"

Mappability track

MAPPABILITY=$(pwd)/referenceData/wgEncodeCrgMapabilityAlign36mer.bigWig

Mappability threshold to apply in fithic

e.g. 0.5

FITHIC_MAPPABILITYTHRESHOLD=0.5

pattern indicating which chromosomes to use

FITHIC_CHROMOSOMES='^16$'

indicate if contact matrix should be kept

e.g FITHIC_KEEPCONTACTMATRIX="1"

FITHIC_KEEPCONTACTMATRIX="1"

FITHIC_CHROMOSOMES="chr[0-9XY]+" FITHIC_POOLDATA="1" FITHIC_POOLED_SAMPLE_NAME="Liver"

CALL_TAD_CHROMOSOMES="chr[0-9]*"

****

Resources

****

WALLTIME_HICUP=00:30:00 MEMORY_HICUP=8 CPU_HICUP=30 NODES_HICUP="nodes=1:ppn=2"

WALLTIME_FITHIC=0:30:00 MEMORY_FITHIC=2 CPU_FITHIC=30 NODES_FITHIC="nodes=1:ppn=2"


===================================================== My workflow is as follows:


module load ngsane/0.5.1.0 . $NGSANE_BASE/conf/header.sh module load samtools/1.1 ./bin/testRUNHIC.qsub trigger.sh tmp/configHICUP.txt direct module load hicorrection/1.1 module load R/3.0.2 trigger.sh tmp/configFITHIC.txt direct trigger.sh tmp/configHiC.txt html


Gurado commented 9 years ago

Hi xflicsu,

I cam across this issue just the other day too. It looks like the iterative bias correction blows out of proportion for big datasets where the difference between entries is substantial,e.g. the nonmappable regions having 0 doubles-sided (ds) read mapping while within bin ds reads are very high. Imakaev et al thus pruned the contact matrix. I have started writing a fix for that yesterday but it will take me a couple more days to finish that.

xflicsu commented 9 years ago

Thanks Gurado! Please let me know when you fix that. Then, I will re-run my project. Thanks again!

Gurado commented 9 years ago

Should be fixed with release v0.5.2.