deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

hicBuildMatrix filters most pairs as "One mate not close to rest site" #502

Closed chesi closed 4 years ago

chesi commented 4 years ago

Hi, I'd like to use hicBuildMatrix to convert from BAM files to matrix format. The data is Capture C preprocessed with HiCUP. I use the dedupped filtered BAM files from HiCUP as input for hicBuildMatrix. Most of my read pairs are filtered as "One mate not close to rest site". This is my command: hicBuildMatrix -s rHMC3_1_R1.bam rHMC3_1_R2.bam --outFileName rHMC3_1_chr22.matrix --QCfolder QC_chr22 --binSize 1000 --region chr22 --threads 8 --skipDuplicationCheck --restrictionSequence GATC

I have also tried without the --restrictionSequence GATC option with the same problem. My average library fragment size is 300bp. What am I doing wrong? Do I need to change some default parameter? Is there a way to avoid filtering completely (since it has already been done in HiCUP)? Thanks, Alessandra

joachimwolff commented 4 years ago

Hi Alessandra,

what does most of the reads mean? Is it possible to send me a small subpart of you data? The best would be the RAW Fastq, filtered and mapped Bam files.

Thanks a lot,

Joachim

chesi commented 4 years ago

Hi Joachim,

99%. But it seems this only happens when I use the --region option (I tried chr22 and chr21). When I run without that option it seems OK. I attach a subsample of my data (BAM, mapped and filtered). Thanks! BAM-files.zip

chesi commented 4 years ago

Also, it did not write any output matrix file, only the QC folder. Did not give any error message.

LeilyR commented 4 years ago

Give it the suffix to generate the matrix file. For example .h5 or .cool, this should solves your issue.

joachimwolff commented 4 years ago

Hi Alessandra,

after a considering your shared data I think the following is happening:

The interpretation of 'not close to restriction site' is technically not wrong, however, doesn't make any sense in reporting reads outside of the defined region as this. We have to improve this QC report in the next release.

Thanks a lot for reporting this issue!

Best,

Joachim

LeilyR commented 4 years ago

@chesi I assume your problem s solved so I close this issue. please get back to us if there is any other question.