Closed abhisheksinghnl closed 5 years ago
Hi,
the help text should be:
hicBuildMatrix --help
usage: hicBuildMatrix --samFiles two sam files two sam files --outFileName
FILENAME --QCfolder FOLDER [--outBam bam file]
(--binSize BINSIZE [BINSIZE ...] | --restrictionCutFile BED file)
[--minDistance MINDISTANCE] [--maxDistance MAXDISTANCE]
[--maxLibraryInsertSize MAXLIBRARYINSERTSIZE]
[--restrictionSequence RESTRICTIONSEQUENCE]
[--danglingSequence DANGLINGSEQUENCE]
[--region CHR:START-END] [--keepSelfCircles]
[--minMappingQuality MINMAPPINGQUALITY]
[--threads THREADS] [--inputBufferSize INPUTBUFFERSIZE]
[--doTestRun] [--skipDuplicationCheck] [--help]
[--version]
Using an alignment from a program that supports local alignment (eg. Bowtie2)
where both PE reads are mapped using the --local option, this program reads
such file and creates a matrix of interactions.
Required arguments:
--samFiles two sam files two sam files, -s two sam files two sam files
The two PE alignment sam files to process (default:
None)
--outFileName FILENAME, -o FILENAME
Output file name for the Hi-C matrix. (default: None)
--QCfolder FOLDER Path of folder to save the quality control data for
the matrix. The log files produced this way can be
loaded into `hicQC` in order to compare the quality of
multiple Hi-C libraries. (default: None)
Optional arguments:
--outBam bam file, -b bam file
Output bam file to process. Optional parameter. A bam
file containing all valid Hi-C reads can be created
using this option. This bam file could be useful to
inspect the distribution of valid Hi-C reads pairs or
for other downstream analyses, but is not used by any
HiCExplorer tool. Computation will be significantly
longer if this option is set. (default: None)
--binSize BINSIZE [BINSIZE ...], -bs BINSIZE [BINSIZE ...]
Size in bp for the bins. The bin size depends on the
depth of sequencing. Use a larger bin size for
libraries sequenced with lower depth. Alternatively,
the location of the restriction sites can be given
(see --restrictionCutFile). Optional for mcool file
format: Define multiple resolutions which are all a
multiple of the first value. Example: --binSize 10000
20000 50000 will create a mcool file formate
containing the three defined resolutions. (default:
10000)
--restrictionCutFile BED file, -rs BED file
BED file with all restriction cut places (output of
"findRestSite" command). Should contain only mappable
restriction sites. If given, the bins are set to match
the restriction fragments (i.e. the region between one
restriction site and the next). (default: None)
--minDistance MINDISTANCE
Minimum distance between restriction sites.
Restriction sites that are closer than this distance
are merged into one. This option only applies if
--restrictionCutFile is given. (default: 300)
--maxDistance MAXDISTANCE
This parameter is now obsolete. Use
--maxLibraryInsertSize instead (default: None)
--maxLibraryInsertSize MAXLIBRARYINSERTSIZE
The maximum library insert size defines different cut
offs based on the maximum expected library size. *This
is not the average fragment size* but the higher end
of the the fragment size distribution (obtained using
for example a Fragment Analyzer or a Bioanalyzer)
which usually is between 800 to 1500 bp. If this value
if not known use the default of 1000. The insert value
is used to decide if two mates belong to the same
fragment (by checking if they are within this max
insert size) and to decide if a mate is too far away
from the nearest restriction site. (default: 1000)
--restrictionSequence RESTRICTIONSEQUENCE, -seq RESTRICTIONSEQUENCE
Sequence of the restriction site. (default: None)
--danglingSequence DANGLINGSEQUENCE
Sequence left by the restriction enzyme after cutting.
Each restriction enzyme recognizes a different DNA
sequence and, after cutting, they leave behind a
specific "sticky" end or dangling end sequence. For
example, for HindIII the restriction site is AAGCTT
and the dangling end is AGCT. For DpnII, the
restriction site and dangling end sequence are the
same: GATC. This information is easily found on the
description of the restriction enzyme. The dangling
sequence is used to classify and report reads whose 5'
end starts with such sequence as dangling-end reads. A
significant portion of dangling-end reads in a sample
are indicative of a problem with the re-ligation step
of the protocol. (default: None)
--region CHR:START-END, -r CHR:START-END
Region of the genome to limit the operation to. The
format is chr:start-end. It is also possible to just
specify a chromosome, for example --region chr10
(default: None)
--keepSelfCircles If set, outward facing reads without any restriction
fragment (self circles) are kept. They will be counted
and shown in the QC plots. (default: False)
--minMappingQuality MINMAPPINGQUALITY
minimum mapping quality for reads to be accepted.
Because the restriction enzyme site could be located
on top of the read, this may reduce the reported
quality of the read. Thus, this parameter may be
adusted if too many low quality (but otherwise
perfectly valid Hi-C reads) are found. A good strategy
is to make a test run (using the --doTestRun), then
checking the results to see if too many low quality
reads are present and then using the bam file
generated to check if those low quality reads are
caused by the read not being mapped entirely.
(default: 15)
--threads THREADS Number of threads. Using the python multiprocessing
module. One master process which is used to read the
input file into the buffer and one process which is
merging the output bam files of the processes into one
output bam file. All other threads do the actual
computation. Minimum value for the '--thread'
parameter is 2. The usage of 8 threads is optimal if
you have an HDD. A higher number of threads is only
useful if you have a fast SSD. Have in mind that the
performance of hicBuildMatrix is influenced by the
number of threads, the speed of your hard drive and
the inputBufferSize. To clearify: the peformance with
a higher thread number is not negative influenced but
not positiv too. With a slow HDD and a high number of
threads many threads will do nothing most of the time.
(default: 4)
--inputBufferSize INPUTBUFFERSIZE
Size of the input buffer of each thread. 400,000 read
pairs per input file per thread is the default value.
Reduce this value to decrease memory usage. (default:
400000)
--doTestRun A test run is useful to test the quality of a Hi-C
experiment quickly. It works by testing only 1,000,000
reads. This option is useful to get an idea of quality
control values like inter-chromosomal interactions,
duplication rates etc. (default: False)
--skipDuplicationCheck
Identification of duplicated read pairs is memory
consuming. Thus, in case of memory errors this check
can be skipped. However, consider running a
`--doTestRun` first to get an estimation of the
duplicated reads. (default: False)
--help, -h show this help message and exit
--version show program's version number and exit
I tested this with HiCExplorer version 2.2.1 and python 3.6.
Have you installed HiCExplorer in its own environment? Is it maybe possible that you have multiple HiCExplorer versions? What is the output of which hicBuildMatrix
and whereis hicBuildMatrix
?
Best,
Joachim
Hi,
Thank you for your reply. Here are the outputs.
which hicBuildMatrix /tools/eb/software/Miniconda3/4.4.10/envs/hicexplorer/bin/hicBuildMatrix
whereis hicBuildMatrix hicBuildMatrix: /gpfs/gssgpfs1/biogrid/tools/eb/software/Miniconda3/4.4.10/envs/hicexplorer/bin/hicBuildMatrix /gpfs/gssgpfs1/biogrid/tools/eb/software/Miniconda3/4.4.10/bin/hicBuildMatrix
I see the problem, but how should I fix it?
Remove all HiCExplorer versions and install it again:
conda remove hicexplorer
to make conda happy, and run multiple times as long as no version is installed anymore: pip uninstall hicexplorer
. Make sure no HiCExplorer is installed and then install HiCExplorer again with conda.
Hi,
I uninstalled hicexplorer from all the places.
reinstalled it conda create -c bioconda --name hicexplorer hicexplorer
checked it which hicBuildMatrix /tools/eb/software/Miniconda3/4.4.10/envs/hicexplorer/bin/hicBuildMatrix
whereis hicBuildMatrix hicBuildMatrix: /gpfs/gssgpfs1/biogrid/tools/eb/software/Miniconda3/4.4.10/envs/hicexplorer/bin/hicBuildMatrix
However, the version that is getting installed is 1.3.
$ hicBuildMatrix --version hicBuildMatrix 1.3
:(
An older version is being installed. How can I bypass this?
conda create -c bioconda --name hicexplorer_new hicexplorer=2.2.1 python=3.6
$ hicBuildMatrix --version
hicBuildMatrix 2.2.1
Should be the expected outcome. I don't get how 1.3 can be installed from:
conda create -c bioconda --name hicexplorer hicexplorer
@abhisheksinghnl can you please paste the conda create
output here? So the versions that are fetched and installed? Thanks!
Hi,
I have used this command and it seems that all is fine now.
conda create -c bioconda -c conda-forge --name hicexplorer_new hicexplorer=2.2.1 python=3.6
thank you for your help.
Cool, if you have time please post the output of our previous command, I'm curious.
Hi,
I have installed hicexplorer using conda. The version that is installed is hicexplorer 2.2.1
However, when I look into the functionality of hicBuildmatrix using help I see following options
Could anyone please point as to what is going wrong in here.
thank you.