aryeelab / hichipper

A preprocessing and QC pipeline for HiChIP data
MIT License
33 stars 12 forks source link

Double counting validParis #68

Open WeiqiangZhou opened 5 years ago

WeiqiangZhou commented 5 years ago

Hi Caleb,

I think hichipper is double counting the validPairs. In the hicpro output folder /hic_results/data/sample/ , there will be a number of "*.validPairs" files and a "allValidPairs" file. The "allValidPairs" files should be the same as merging the "*.validPairs" files. I found that hichipper will search for all "*Pairs" files in the folder which means it will count the validPairs twice. I think it affects a number of steps in the hichipper pipeline including the peak calling and counting reads in peak regions. I used some tricks to workaround it but it may be good for you to know this bug.

Ken

caleblareau commented 5 years ago

This is a good point. Thanks for catching it.

My sense is that it would have little impact if fragments were double counted based on Macs2 duplicate removal, but I will keep this in mind. Thanks Ken.

On May 17, 2019, at 12:46 PM, Weiqiang Zhou notifications@github.com wrote:

Hi Caleb,

I think hichipper is double counting the validPairs. In the hicpro output folder /hic_results/data/sample/ , there will be a number of ".validPairs" files and a "allValidPairs" file. The "allValidPairs" files should be the same as merging the ".validPairs" files. I found that hichipper will search for all "*Pairs" files in the folder which means it will count the validPairs twice. I think it affects a number of steps in the hichipper pipeline including the peak calling and counting reads in peak regions. I used some tricks to workaround it but it may be good for you to know this bug.

Ken

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aryeelab/hichipper/issues/68?email_source=notifications&email_token=AD32FYJNB75GH4AJBCIYMZLPV3ONTA5CNFSM4HNW2MKKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUOHPHQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AD32FYKQH33L3YDXMRPNU4TPV3ONTANCNFSM4HNW2MKA.

WeiqiangZhou commented 5 years ago

This is a good point. Thanks for catching it. My sense is that it would have little impact if fragments were double counted based on Macs2 duplicate removal, but I will keep this in mind. Thanks Ken.

Thanks Caleb. In my experience, without correcting for this bug, hichipper generates significantly more peaks (e.g., N=268,065) than correcting for this bug (e.g., N=187,489). This is based on the following peak calling setting: peaks: