ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
241 stars 123 forks source link

regionPeak files generated in tf mode #192

Closed asmariyaz23 closed 3 years ago

asmariyaz23 commented 3 years ago

Hello,

I ran ChIP-Seq ENCODE pipeline in "tf" mode for paired end datasets (has 3 biological replicates and 1 Input sample). Here is the input JSON

{
    "chip.title" : "BZW1 (paired-end)",
    "chip.description" : "BZW1",

    "chip.pipeline_type" : "tf",
    "chip.aligner" : "bowtie2",
    "chip.align_only" : false,
    "chip.true_rep_only" : false,

    "chip.genome_tsv" : "/cluster/tools/data/commondata/ENCODE/chip-seq/hg38/hg38.tsv",

    "chip.paired_end" : true,
    "chip.ctl_paired_end" : true,
    "chip.always_use_pooled_ctl" : true,

    "chip.align_bowtie2_mem_factor": 0.50,
    "chip.align_cpu": 10,
    "chip.filter_cpu": 6,
    "chip.filter_mem_factor": 0.6,
    "chip.bam2ta_cpu": 4,
    "chip.bam2ta_mem_factor": 0.5,
    "chip.spr_mem_factor":5.5,
    "chip.jsd_cpu":6,
    "chip.jsd_mem_factor":0.3,
    "chip.xcor_cpu":4,
    "chip.xcor_mem_factor":2.0,
    "chip.call_peak_cpu":14,
    "chip.call_peak_spp_mem_factor":30,

    "chip.fastqs_rep1_R1" : [ "BZW1-1_S9_L001_R1_001.fastq.gz" ],
    "chip.fastqs_rep1_R2" : [ "BZW1-1_S9_L001_R2_001.fastq.gz" ],
    "chip.fastqs_rep2_R1" : [ "BZW1-2_S10_L001_R1_001.fastq.gz" ],
    "chip.fastqs_rep2_R2" : [ "BZW1-2_S10_L001_R2_001.fastq.gz" ],
    "chip.fastqs_rep3_R1" : [ "BZW1-3_S11_L001_R1_001.fastq.gz" ],
    "chip.fastqs_rep3_R2" : [ "BZW1-3_S11_L001_R2_001.fastq.gz" ],

    "chip.ctl_fastqs_rep1_R1" : [ "Input-BZW1_S12_L001_R1_001.fastq.gz"],
    "chip.ctl_fastqs_rep1_R2" : [ "Input-BZW1_S12_L001_R2_001.fastq.gz"]
}

Could you please help me understand why did I get regionPeak files as opposed to narrowPeak files as generated in histone mode? According to the format documentation for regionPeak or broadPeak the resultant BED file is supposed to have 9 columns however the file I see in the "call-reproducibility_idr" directory, idr.conservative_peak.regionPeak.gz has 10 columns idr.conservative_peak10.regionPeak.gz. I am confused as to what I have is a regionPeak or narrowPeak file currently?

Thank you for you help! Asma

leepc12 commented 3 years ago

Regions in that documentation doesn't mean regionPeak. It is actually the same narrowPeak format.