Closed xflicsu closed 9 years ago
Hi xflicsu,
I cam across this issue just the other day too. It looks like the iterative bias correction blows out of proportion for big datasets where the difference between entries is substantial,e.g. the nonmappable regions having 0 doubles-sided (ds) read mapping while within bin ds reads are very high. Imakaev et al thus pruned the contact matrix. I have started writing a fix for that yesterday but it will take me a couple more days to finish that.
Thanks Gurado! Please let me know when you fix that. Then, I will re-run my project. Thanks again!
Should be fixed with release v0.5.2.
Hello, I use ngsane to process Hi-C-seq datasets with two replicates. But the interaction paires are less than total of indivadule. So, could help me figure out some error or provide config file to deal with several replicates.
Thanks!
Basic information as follows:
data directory
0.5.1.0/smokebox/fastq/HiC HiC/ |-- GMall_HindIII2_R1.fastq.gz |-- GMall_HindIII2_R2.fastq.gz |-- GMall_HindIII3_R1.fastq.gz |-- GMall_HindIII3_R2.fastq.gz |-- GMall_HindIII1_R1.fastq.gz |-- GMall_HindIII1_R2.fastq.gz
========================================================= confige file
author: Fabian Buske
date: Dec 2013
######################################
Resources
#####################################
****
Tasks
****
RUNHICUP="" # map HiC data with hicup RUNFITHIC="" # call significant chromatin interactions
****
Paths
****
SOURCE=$(pwd)
which folder to run on (i.e. folders within fastq directory)
declare -a DIR; DIR=( HiC )
folder/task containing the data this task operates on
INPUT_HICLIB="fastq" INPUT_HICUP="fastq" INPUT_FITHIC=$TASK_HICUP
where to write the output
OUT=$SOURCE
where to write the log files
QOUT=$OUT/qout
Summary file name
HTMLOUT="Summary"
****
PARAMETER (mandatory)
****
fastq file suffix
FASTQ="fastq.gz"
read indicator immediately preceding the fastq file suffix
READONE="_R1" READTWO="_R2"
reference as single chromosomes
FASTA_CHROMDIR=$(pwd)/referenceData/
bowtie v2.0 index including basename
FASTA=$(pwd)/referenceData/mm9_HiCcopy.fasta
Suffix of inout files to look for
INPUT_FITHIC_SUFFIX="$ASD.bam"
genome assembly
e.g. hg19
REFERENCE_NAME="mm9"
HICLIB_GAPFILE=$(pwd)/referenceData/mm9.gap.txt
Enzymes, see http://biopython.org/DIST/docs/api/Bio.Restriction-module.html
HICLIB_RENZYMES="HindIII"
restriction enzymes
HICUP_RENZYME1="HindIII" HICUP_RCUTSITE1="A^AGCTT" HICUP_RENZYME2= HICUP_RCUTSITE2=
****
PARAMETER (optional overwriting defaults)
****
HICUPMAPPER_ADDPARAM="--format Sanger"
HICLIB_READLENGTH= HICLIB_CHROMOSOME=16
uncomment to keep intermediate bam files from iterative mapping
HICLIB_KEEPBAM=1
HICORRECTOR_MAXITER=100
FITHICADDPARAM="--lowerbound 20000 --upperbound 5000000 " HIC_RESOLUTION="40000"
Mappability track
MAPPABILITY=$(pwd)/referenceData/wgEncodeCrgMapabilityAlign36mer.bigWig
Mappability threshold to apply in fithic
e.g. 0.5
FITHIC_MAPPABILITYTHRESHOLD=0.5
pattern indicating which chromosomes to use
FITHIC_CHROMOSOMES='^16$'
indicate if contact matrix should be kept
e.g FITHIC_KEEPCONTACTMATRIX="1"
FITHIC_KEEPCONTACTMATRIX="1"
FITHIC_CHROMOSOMES="chr[0-9XY]+" FITHIC_POOLDATA="1" FITHIC_POOLED_SAMPLE_NAME="Liver"
CALL_TAD_CHROMOSOMES="chr[0-9]*"
****
Resources
****
WALLTIME_HICUP=00:30:00 MEMORY_HICUP=8 CPU_HICUP=30 NODES_HICUP="nodes=1:ppn=2"
WALLTIME_FITHIC=0:30:00 MEMORY_FITHIC=2 CPU_FITHIC=30 NODES_FITHIC="nodes=1:ppn=2"
===================================================== My workflow is as follows:
module load ngsane/0.5.1.0 . $NGSANE_BASE/conf/header.sh module load samtools/1.1 ./bin/testRUNHIC.qsub trigger.sh tmp/configHICUP.txt direct module load hicorrection/1.1 module load R/3.0.2 trigger.sh tmp/configFITHIC.txt direct trigger.sh tmp/configHiC.txt html