Closed mdozmorov closed 4 years ago
Hi @mdozmorov , Could you please generate indices for your bam files. I will update the documentation if needed.
I tried. Samtools complain the bwa-aligned files are unsorted. Sorting them allows indexing but then hicBuildMatrix complains the files are not paired. So, not sure how to break out of this circular dependency
On Fri, Sep 11, 2020, 9:21 AM LeilyR notifications@github.com wrote:
Hi @mdozmorov https://github.com/mdozmorov , Could you please generate indices for your bam files. I will update the documentation.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/600#issuecomment-691091409, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGTFMKLMMDMDHB53DPNK23SFIP5JANCNFSM4RHMLMKQ .
Actually I have checked our pipeline , we apparently never generated the index files and it always worked. Which version are you using? I can test here and see how it performs.
I use the latest, hicexplorer 3.5.1
. It was easily installable into Conda environment.
It seems the problem is in differing numbers of reads in BAM files. Something may have happened when, during the alignment, file permissions/owners were changed. I ran sorting then flagstat, and see for one file:
644982312 + 0 in total (QC-passed reads + QC-failed reads)
and for another
652025938 + 0 in total (QC-passed reads + QC-failed reads)
hicBuildMatrix
starts running, when reads are apparently paired, but then stalls without throwing an error. Only looking in the output file I can see [E::idx_find_and_load] Could not retrieve index file
.
I'm currently running the realignment, and then will try hicBuildMatrix and close, if all works. Checking BAM files for the same number of reads may be a good built-in feature, or at least clarify in the documentation so users should check.
@mdozmorov Thanks for the suggestion, we will work on it. I hope your issue has been solved after remapping.
The runs are persistently failing. I have a subset of data, 10000 FASTQ lines, and the folder with all scripts and output, to reproduce the issue. Would it be possible to look at it, to see what am I doing wrong? Compressed files are small. Please, get in touch at mikhail.dozmorov@gmail.com, and I'll share them. Thanks!
It turned out that the reported message is only a warning message and has not affect on the output.
I would suggest though to silence this warning. It may make users start sorting their BAM files by position to generate indices which in turn would mess up the synchronization between the R1 and R2 BAMs.
Following the documentation, I map raw data using the standard
bwa
commands, like :bwa mem -A 1 -B 4 -E 50 -L 0 -t 16 $fastaindx $Rfiles | samtools view -Shb - > $fname.bam
When building a matrix, I'm getting errors:
[E::idx_find_and_load] Could not retrieve index file for 'sample_R1.bam' [E::idx_find_and_load] Could not retrieve index file for 'sample_R2.bam'
The hicBuildMatrix command is:
hicBuildMatrix -s $BAMR1 $BAMR2 \ --restrictionSequence GATC \ --danglingSequence GATC \ --QCfolder ${SAMPLE}_binSizeQCs \ --threads 16 \ --outBam ${SAMPLE}_binSize.bam \ --outFileName ${SAMPLE}_binSize.h5 \ --binSize 10000 \ -rs $restrfile \ --inputBufferSize 400000
What may be wrong?
@mdozmorov I have the same issue. How could you solve the problem? Could you share that please?
@theshowmustgolangon
Have you read this thread? Please confirm that you cannot build the matrix, as the thread suggests it's a warning that has no meaning and does not affect output.
Following the documentation, I map raw data using the standard
bwa
commands, like :When building a matrix, I'm getting errors:
The hicBuildMatrix command is:
What may be wrong?