Open WeiqiangZhou opened 5 years ago
This is a good point. Thanks for catching it.
My sense is that it would have little impact if fragments were double counted based on Macs2 duplicate removal, but I will keep this in mind. Thanks Ken.
On May 17, 2019, at 12:46 PM, Weiqiang Zhou notifications@github.com wrote:
Hi Caleb,
I think hichipper is double counting the validPairs. In the hicpro output folder /hic_results/data/sample/ , there will be a number of ".validPairs" files and a "allValidPairs" file. The "allValidPairs" files should be the same as merging the ".validPairs" files. I found that hichipper will search for all "*Pairs" files in the folder which means it will count the validPairs twice. I think it affects a number of steps in the hichipper pipeline including the peak calling and counting reads in peak regions. I used some tricks to workaround it but it may be good for you to know this bug.
Ken
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aryeelab/hichipper/issues/68?email_source=notifications&email_token=AD32FYJNB75GH4AJBCIYMZLPV3ONTA5CNFSM4HNW2MKKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUOHPHQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AD32FYKQH33L3YDXMRPNU4TPV3ONTANCNFSM4HNW2MKA.
This is a good point. Thanks for catching it. My sense is that it would have little impact if fragments were double counted based on Macs2 duplicate removal, but I will keep this in mind. Thanks Ken. …
Thanks Caleb. In my experience, without correcting for this bug, hichipper generates significantly more peaks (e.g., N=268,065) than correcting for this bug (e.g., N=187,489). This is based on the following peak calling setting: peaks:
Hi Caleb,
I think hichipper is double counting the validPairs. In the hicpro output folder /hic_results/data/sample/ , there will be a number of "*.validPairs" files and a "allValidPairs" file. The "allValidPairs" files should be the same as merging the "*.validPairs" files. I found that hichipper will search for all "*Pairs" files in the folder which means it will count the validPairs twice. I think it affects a number of steps in the hichipper pipeline including the peak calling and counting reads in peak regions. I used some tricks to workaround it but it may be good for you to know this bug.
Ken