aryeelab / hichipper

A preprocessing and QC pipeline for HiChIP data
MIT License
33 stars 12 forks source link

About the pre processing / RM_DUP = 0 #56

Closed vidaletal closed 4 years ago

vidaletal commented 6 years ago

Dear Caleb,

I've started working with Hi-ChIP quite recently and I was wondering if it wouldn't make more sense to keep the duplicates of Hi-ChIP samples in the pre-processing of HiC-Pro. The reads from Hi-ChIP are expected to overlap, decreasing the library complexity as it is expected in ChIP-seq. Even more than ChIP-seq as it has the biotin pulldown as well. In this sense, wouldn't these reads represent exactly what we were looking for: an enrichment?

In code terms, shouldn't we:

#######################################################################
## Hi-C processing
#######################################################################
MIN_CIS_DIST =
GET_ALL_INTERACTION_CLASSES = 1
GET_PROCESS_SAM = 1
RM_SINGLETON = 1
RM_MULTI = 1
RM_DUP = 0

Please correct me if wrong.

Best

Raphael

caleblareau commented 6 years ago

Hi Raphael,

I definitely agree with your point. When I compared the _allValidPairs to the .validPairs (which should compare with and without removing PCR duplicates), I saw that 90% of the data was retained after whatever HiC-Pro was doing for de-duplicating reads. My sense for the datasets that I've looked at then was to exclude them since it was evidently a small amount. I wouldn't disagree with your approach though

vidaletal commented 6 years ago

Hi Caleb,

Many thanks for your feedback. I'm observing this in all my dataset.
Would you mind provide the .validPairs of the sample dSRR3467177 in hichipper/tests/hicpro/hic_results/data/dSRR3467177/? I'd like to compare with my own dataset after keeping the duplicates.

Many thanks,

Raphael

caleblareau commented 6 years ago

Do these files work?

https://github.com/aryeelab/hichipper/tree/master/tests/hicpro/hic_results/data/dSRR3467177

On Oct 16, 2018, at 4:59 AM, Raphael notifications@github.com wrote:

Hi Caleb,

Many thanks for your feedback. I'm observing this in all my dataset. Would you mind provide the .validPairs of the sample dSRR3467177 in hichipper/tests/hicpro/hic_results/data/dSRR3467177/? I'd like to compare with my own dataset after keeping the duplicates.

Many thanks,

Raphael

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aryeelab/hichipper/issues/56#issuecomment-430158070, or mute the thread https://github.com/notifications/unsubscribe-auth/APei4bgIS_Sopb0pTMoQRs7cLw0HJBnCks5ulZ_fgaJpZM4XZqJU.

vidaletal commented 6 years ago

Many thanks,

Just to double check, have you kept the duplicates in these files during the analysis?