Closed chesi closed 4 years ago
Hi Alessandra,
what does most of the reads
mean? Is it possible to send me a small subpart of you data? The best would be the RAW Fastq, filtered and mapped Bam files.
Thanks a lot,
Joachim
Hi Joachim,
99%. But it seems this only happens when I use the --region option (I tried chr22 and chr21). When I run without that option it seems OK. I attach a subsample of my data (BAM, mapped and filtered). Thanks! BAM-files.zip
Also, it did not write any output matrix file, only the QC folder. Did not give any error message.
Give it the suffix to generate the matrix file. For example .h5 or .cool, this should solves your issue.
Hi Alessandra,
after a considering your shared data I think the following is happening:
What Leily write is correct, the file ending of the matrix is never matrix
but h5
or cool
. If you don't define this, you will get no matrix. We should add an error here in a next release.
More important your error statistics: It seems to me the creation of the matrix works correct, with --region
parameter or without. What is wrong is the error report: Because --region
is set, we identify every read as unassigned
(https://github.com/deeptools/HiCExplorer/blob/master/hicexplorer/hicBuildMatrix.py#L865) and if it is considered as unassinged, it is also assumed it is not close to a restriction site (https://github.com/deeptools/HiCExplorer/blob/master/hicexplorer/hicBuildMatrix.py#L883). By this you get this high number in the statistics which is nothing else as all the reads which are not on chromosome 22.
The interpretation of 'not close to restriction site' is technically not wrong, however, doesn't make any sense in reporting reads outside of the defined region as this. We have to improve this QC report in the next release.
Thanks a lot for reporting this issue!
Best,
Joachim
@chesi I assume your problem s solved so I close this issue. please get back to us if there is any other question.
Hi, I'd like to use hicBuildMatrix to convert from BAM files to matrix format. The data is Capture C preprocessed with HiCUP. I use the dedupped filtered BAM files from HiCUP as input for hicBuildMatrix. Most of my read pairs are filtered as "One mate not close to rest site". This is my command: hicBuildMatrix -s rHMC3_1_R1.bam rHMC3_1_R2.bam --outFileName rHMC3_1_chr22.matrix --QCfolder QC_chr22 --binSize 1000 --region chr22 --threads 8 --skipDuplicationCheck --restrictionSequence GATC
I have also tried without the --restrictionSequence GATC option with the same problem. My average library fragment size is 300bp. What am I doing wrong? Do I need to change some default parameter? Is there a way to avoid filtering completely (since it has already been done in HiCUP)? Thanks, Alessandra