Normalized contact value?

ay-lab / FitHiChIP

Statistically Significant loops from HiChIP data

MIT License

39 stars 20 forks source link

Normalized contact value? #80

Open yanchunzhang opened 2 years ago

yanchunzhang commented 2 years ago

Hi, do you calculate a normalized contact value in the final bed files? I checked the file but didn't see it. Just like the normalized value in Hi-C contact map.

And what's your recommend cutoff for Qvalue? 0.01 or 0.05?

Thanks, Yanchun

souryacs commented 2 years ago

Hi @yanchunzhang We do not explicitly compute a normalized contact count value. However, if you are using HiC-pro to align the input fastq files, you may use HiC-pro output contact matrices generated by ICE normalization. I'd recommend using q-value = 0.01. You can customize the q-value in the configuration file.

yanchunzhang commented 2 years ago

Thanks a lot! I used Q-value 0.01 on a mouse sample and only got around 10k-12k significant loops, which are much less than I expected. According to your experience, does that means the sample preparation is not in good-quality or failed?

Thanks, Yanchun

ay-lab commented 2 years ago

Hi @yanchunzhang Which model did you use to call the loops? Is it the stringent model (P2P=1) or the loose (P2P=0) and which bias correction did you use (ICE or coverage)? I'd recommend using the coverage bias and testing with both loose and stringent background models. But 10k-12k significant loops are quite OK, specifically if it is from the P2P=1 model. What is the total sequencing depth of your library? Regarding QC, you can check the HiCPro output logs, like the number of duplicate reads, number and fraction of CIS reads, the fraction of CIS reads > 10 Kb, etc.

YichaoOU commented 2 years ago

Hi @ay-lab ,

I have a fithichip result with only 300+ merged peak to peak loops (q-value<=0.05) using the loose (P2P=0) and coverage bias setting. My QC is good according to: https://hichip.readthedocs.io/en/latest/library_qc.html. Total sequencing depth is 200M+, number of PCR duplicate reads is 40% (88M), No-Dup Cis Read Pairs < 1kb is 71M.

My total number of peaks provided to the fithichip program is ~40K.

Do you know why I only got 300+ loops?

Thanks, Yichao

ay-lab commented 2 years ago

Hi, your QC results show you are effectively left with 11.5M or so valid reads that are >10kb and useful. So you are working with quite sparse data. You haven't mentioned what resolution but likely you may want to consider 20kb or lower resolution for analysis

YichaoOU commented 2 years ago

@ay-lab Thank you!

YichaoOU commented 2 years ago

Hi @ay-lab

I tried different bin sizes and q-value/p-value cutoffs for peak-to-all loop calling. Here is a screenshort at our ROI:

They just look so different. It's likely that smaller bin (2.5k, 5k) will have lower read counts and thus more variance and low confidence, so they look quite different. But I'm not sure how to explain 20kb significant loops vs 5kb significant loops (0.05 qvalue), there is no overlap at all. Even if we increase to 0.5 qvalue at 5kb bin, those two short interactions in the 20kb track do not show up in the 5kb track.

We have micro-capture-C data at this ROI, so we think currently loops from 5kb bin and 0.5 qvalue look better, but this q-value is so insignificant.

I'm wondering if you have any thoughts.

Thanks, Yichao