GuipengLi / ChIA-PET2

a versatile and flexible pipeline for analysing different variants of ChIA-PET data
GNU General Public License v3.0
34 stars 19 forks source link

Error: Don't have enough confident interactions to learn the model #3

Closed titosand closed 8 years ago

titosand commented 8 years ago

Hi, I'm running into the above error for step 7 in the pipeline. Can someone better explain what the issue is (i.e. what makes a confident interaction, how many do I need), and if there are any methods for getting around this? Thank you for all your help!

This is my dataset: https://www.encodeproject.org/experiments/ENCSR752QCX/ (Using the 2x14G paired ends) Below is the terminal output for the last few steps:

EDIT: It seems that the value of confidentN=0. I've also attached the output of QCplot, which shows a value of NaN for chromosome interactions. I find it strange that not a single interaction passes the threshold. out.QCplot.pdf

`'[10-12 01:24:13] Running Step 5: Detect Interactions ...

extendpeak output/out_peaks.narrowPeak 500 output/out_peaks.slopPeak Running extendpeak... bedtools slop -i output/out_peaks.narrowPeak -g genome/chrom_hg19_ebv.sizes -b 500 | bedtools merge -d 256 > output/outpeaks.slopPeak awk 'BEGIN{OFS="\t";i=1}{print $1,$2,$3,"peak"i;i=i+1}' output/out_peaks.slopPeak > tmp.bed; mv tmp.bed output/out_peaks.slopPeak peak_depth2 output/out.rmdup.bedpe.tag 100 output/out_peaks.slopPeak output/out.peaks.depth Running peakdepth... psort output/out.rmdup.bedpe.tag output/out.rmdup.bedpe.tag.sorted > /dev/null 2>&1 psort output/out_peaks.slopPeak output/out_peaks.slopPeak.sorted > /dev/null 2>&1 bedtools coverage -sorted -b output/out.rmdup.bedpe.tag.sorted -a output/out_peaks.slopPeak.sorted -counts > output/out.peaks.depth build_interaction output/out.rmdup.bedpe output/out.peaks.depth output/out.interactions output/out.bedpe.stat Running build_interaction... pairToBed -a output/out.rmdup.bedpe -b output/out.peaks.depth -type both > output/out.rmdup.bedpe.tmp bedpe2Interaction output/out.rmdup.bedpe.tmp output/out.interactions output/out.bedpe.stat Running bedpe2Interaction... PETs in the same peak: 700 Intra PETs bewteen two peaks: 106 Inter PETs bewteen two peaks: 717 Intra loops: 106 Inter loops: 717

[10-12 01:24:20] Running Step 6: QCplots ...

Rscript /mnt/data/scu/bin/ChIA-PET2_0.9.2/bin/QCplots.R output out Loading required package: ggplot2 Loading required package: scales null device 1

[10-12 01:24:21] Running Step 7: MICC ...

Rscript /mnt/data/scu/bin/ChIA-PET2_0.9.2/bin/MICC2.R output/out.interactions.intra.bedpe output/out.interactions.inter.bedpe output/out.interactions.MICC 1 Loading required package: VGAM Loading required package: methods Loading required package: stats4 Loading required package: splines Warning message: replacing previous import by ‘splines::splineDesign’ when loading ‘VGAM’ Running MICC... Loading intra data... Loading inter data... Cacluating... Error: Don't have enough confident interactions to learn the model. `

GuipengLi commented 8 years ago

Hi titosand, From the QCplot, we can find that more than 99% of reads were unmapped, which is weird. I think the sequence adapters or linkers were not trimmed properly. What's the command you used? If the linkers are trimmed properly, you will get much more interactions.

titosand commented 8 years ago

I see, thank you for responding! I used the following command:

ChIA-PET2 -g genome/hg19_ebv.fa -b genome/chrom_hg19_ebv.sizes -f SRR1514669_1.fastq -r SRR1514669_2.fastq -o output -t 32

Taken from the experiment protocol:

The two Linkers have the same nucleotide sequences, except four nucleotides in the middle are different (Linker A with TAAG; Linker B with ATGT)

^^This sounds like the default

GuipengLi commented 8 years ago

You may check what linker A and linker B were used for the dataset. The default parameters (-A, -B) may not always be the right one.

titosand commented 8 years ago

What's a good way to check this?

GuipengLi commented 8 years ago

This information usually can be found in the supplementary of the published paper.

GuipengLi commented 8 years ago

If the linker A and linker B are the right one, the problem may be caused by 5' sequence adapters. Is the fastq file adapter-free? Maybe you need to use cutadapt before ChIA-PET2.

titosand commented 8 years ago

Okay, I am going to do some QA using Trim Galore, I will let you know how it goes.

titosand commented 8 years ago

Still no success :( . I regenerated the index for bwa and used fastqc/cutadapt to clean the fastq. Not sure how to proceed.

GuipengLi commented 8 years ago

Could you post the terminal output ? What's the mapping rate now?

titosand commented 8 years ago

The new QCPlot is basically the same as the previous one. I just tried mapping using bowtie and here is the output for aligning one of the 2 fastq files:

bowtie /mnt/data/sajid/Virus/Genome_ebwt/hg19_ebv output/out_1.valid.fastq -S -n 2 -l 50 -k 1 --chunkmbs 500 --sam-nohead --mapq 40 -m 1 --best -p 16 > out_1.valid.sam reads processed: 112087171 reads with at least one reported alignment: 688574 (0.61%) reads that failed to align: 26621 (0.02%) reads with alignments suppressed due to -m: 111371976 (99.36%) Reported 688574 alignments to 1 output stream(s)

EDIT: I think I should be specifying the shortreads option and using bwa aln instead. This is probably the issue. If this ends up being the case, I might suggest making -d=1 the default value, since the original ChIA-PET protocol gives read lengths around 20bp.

GuipengLi commented 8 years ago

Yes, titosand. Try with "-d 1" option for short reads.

titosand commented 8 years ago

Mapping % is improved, but still too low and I get

Error: Don't have enough confident interactions to learn the model.

I have attached the output below.

out.QCplot.pdf

GuipengLi commented 8 years ago

This makes sense now. But for short reads, you may choose some loose threshold of MAPQ (-Q), e.g. -Q 20. Only reads exceed the MAPQ threshold will be kept.

titosand commented 8 years ago

Yes, it worked! Thank you for all your help :+1:

bioinfo89 commented 4 years ago

Hi GuipengLi,

I am currently getting the same error:

Running MICC... Intra data: ChIAPET_rawdata/rep1_ChIAPET.interactions.intra.bedpe Inter data: ChIAPET_rawdata/rep1_ChIAPET.interactions.inter.bedpe Output file: ChIAPET_rawdata/rep1_ChIAPET.interactions.MICC PET count cutoff: 2 Minimun Confident PET count: 5 reltol: 1e-08 Loading intra data... Loading inter data... Cacluating... Intra: 252 Inter: 59 Error: Don't have enough confident interactions to learn the model. Execution halted

I checked for adaptor contamination using FastQC and also confirmed the Linker sequences as mentioned in your above suggestions to solve the error. But I still get the same error.

I am using the following command:

ChIA-PET_tools/ChIA-PET2-0.9.3/bin/ChIA-PET2 -t 4 -d 1 -Q 20 -g /hg19/hg19.fa -b /hg19/hg19.chrom.sizes -f rep1_R1_combined.fastq.gz -r rep1_R2_combined.fastq.gz -A GTTGGATAAG -B GTTGGAATGT -o ChIAPET_rawdata -n rep1_ChIAPET

Attached is my QC_plot file: rep1_ChIAPET.QCplot.pdf

And I am using the following data for analysis (Replicate 1): https://www.encodeproject.org/experiments/ENCSR672RHL/

I am not sure what exactly in my case is causing this error. Any help from your end would be appreciated. Thank you!