deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

Issue merging reads #700

Closed StefanoLonardi closed 3 years ago

StefanoLonardi commented 3 years ago

Welcome to the HiCExplorer GitHub repository! Before opening the issue please check that the following requirements are met :

Retry your command, is it solved now? If not please continue with the following:

==================

Thanks for your great tool. HiCexplorer was installed with bioconda.

stelo@H4:~$ hicInfo --version
hicInfo 3.6
stelo@H4:~$ python --version
Python 3.7.10
stelo@H4:~$ samtools --version
samtools 1.11 Using htslib 1.11 Copyright (C) 2020 Genome Research Ltd.

I have four Hi-C Illumina reads (F,G,L,M) that I want to merge before creating the contact map. I am mapping them as suggested

bwa mem -A1 -B4 -E50 -L0 $REF F_R1.fastq.gz | samtools view -b -o F1.bam -
bwa mem -A1 -B4 -E50 -L0 $REF F_R2.fastq.gz | samtools view -b -o F2.bam -

I can generate a contact map for each set individually, no problem. However if I merge the dataset like this (BAM files are NOT sorted)

samtools merge all1.bam F1.bam G1.bam L1.bam M1.bam
samtools merge all2.bam F2.bam G2.bam L2.bam M2.bam

and I get

[W::bam_merge_core2] No @HD tag found.
[W::bam_merge_core2] No @HD tag found.

then run

hicBuildMatrix --samFiles all1.bam all2.bam --binSize 1000 --restrictionSequence GATC --danglingSequence GATC --restrictionCutFile ref.positions.bed --threads 4 --inputBufferSize 100000 --outBam ALL.hic.bam -o ALL.hic_matrix.h5 --QCfolder ./hicQC

I get the following error

INFO:hicexplorer.hicBuildMatrix:reading /home/stelo/babesia/all1.bam and /home/stelo/babesia/all2.bam to build hic_matrix

[E::idx_find_and_load] Could not retrieve index file for '/home/stelo/babesia/all1.bam'
[E::idx_find_and_load] Could not retrieve index file for '/home/stelo/babesia/all2.bam'
INFO:hicexplorer.hicBuildMatrix:dangling sequences to check are {'GATC': {'pat_forw': 'GATC', 'pat_rev': 'GATC'}}

Traceback (most recent call last):
  File "/home/stelo/miniconda3/bin/hicBuildMatrix", line 7, in <module>
    main()
  File "/home/stelo/miniconda3/lib/python3.7/site-packages/hicexplorer/hicBuildMatrix.py", line 1296, in main
    pMinMappingQuality=args.minMappingQuality
  File "/home/stelo/miniconda3/lib/python3.7/site-packages/hicexplorer/hicBuildMatrix.py", line 713, in readBamFiles
    "the --reorder option".format(mate1.qname, mate2.qname)
AssertionError: FATAL ERROR A00953:243:HYK77DSXY:4:1101:27507:1000 A00953:243:HYK77DSXY:4:1101
:6063:1000 Be sure that the sam files have the same read order If using Bowtie2 or Hisat2 add the --reorder option

Last week I was able individually map, then merge the bam files and generate a contact map from the merged bam using HiCExplorer on another assembly of the same genome, so I am pretty sure that the pipeline is correct.

As I said, I can generate maps for each individual set, but not on the merged set anymore. Maybe there is a different way to merge the contact maps (as a workaround, if merging BAM files is not an option)?

Thanks! Stefano

LeilyR commented 3 years ago

Hi,

  1. your bam files seem to not have a header , that is where your first issue comes from when merging them. Please update your samtools to the last version. I think they take care of it in their more recent version.
  2. In general it is not a good prectice to merge your replicates on the mapped level. I suggest you to build matrices individually, if necessary with a lower resolution and check for the correlation between the replicates before merging them. You can always merge the matices as well instead of merging bams, if it helps.
  3. The error you get from hicexplorer suggests that your R1 and R2 files are having different orders of reads, if that is the case you need to take care of reordering them too. I suspect that happened while merging them.
StefanoLonardi commented 3 years ago

thanks for your quick reply. Here is my answers

  1. I have now installed samtools 1.12, I remapped the reads, converted the SAMs to BAMs, and I still get the error message No @HD tag found
  2. thanks for the suggestion; what hicexplorer function you recommend to merge the matrices?
  3. I tried to use samtools sort -n and sorted the two bam files by read name, and now hicexplorer is happy; you should add this to the documentation in case people have problems with this

thanks for your help, Stefano

LeilyR commented 3 years ago
  1. I would suggest you to commiunicate that with the samtools developers.
  2. Sure, you can use hicSumMatrices, check the doc here
  3. Thanks for the suggestion , We will consider updating the documentation for that.