Hi!
Great job! Really useful tools for scHi-C data and I really want to use it in my research. Here are my questions:
Seems RWR step is taking too much memory, is there any way to optimise it? For example I only have 2 nodes with 96 threads with about 400GB of memory, it takes forever to process thousands of cells.
when I set
parallelism="threaded"
number_of_processors=10
snapHiC still takes all of my 96 threads and I can hardly do any other things on my machine, any ideas why?
Since this pipeline taking so long , is it possible to release a snakemake version of step one instead of the existing mpirun version? They have really nice features and you don't need a rerun after some error.
This is just a suggestion, I'm using dip-c+hickit+hg19 for my pipeline, my pairs file looks like this
maybe consider to add support for hickit pipeline( remove lines start with # and add support for chromosome like 1 instead of chr) but that's okay, it's not difficult to convert them.
We are working on optimizing the random walk step, or maybe approximating it but unfortunately we don't have a clear timeline of when we will be able to optimize this process.
By setting the "number_of_processors" to 10, snapHiC distributes the data among 10 processes to run the random walk. Each process uses numpy for computation of the RWR. Depending on your installation of numpy, it might automatically try to use all cores on your system. Please see this QA. You should be able to prevent this behaviour by:
export OPENBLAS_MAIN_FREE=1 or export OPENBLAS_NUM_THREADS = 1.
We will consider releasing a snakemake version, but current version supports the feature you mention in the RWR step. If for some reason your job terminates (due to error or you stop it), if you restart the run previously computed random walk values will not be computed again and the random walk step will continue where it was left off.
Thank you for this suggestion. This is very helpful, we will add functionalities for different input file format in future releases.
Hi! Great job! Really useful tools for scHi-C data and I really want to use it in my research. Here are my questions:
Seems RWR step is taking too much memory, is there any way to optimise it? For example I only have 2 nodes with 96 threads with about 400GB of memory, it takes forever to process thousands of cells.
when I set
Since this pipeline taking so long , is it possible to release a snakemake version of step one instead of the existing mpirun version? They have really nice features and you don't need a rerun after some error.
This is just a suggestion, I'm using dip-c+hickit+hg19 for my pipeline, my pairs file looks like this
pairs format v1.0
sorted: chr1-chr2-pos1-pos2
shape: upper triangle
chromosome: 1 249250621
chromosome: 2 243199373
chromosome: 3 198022430
chromosome: 4 191154276
chromosome: 5 180915260
chromosome: 6 171115067
chromosome: 7 159138663
chromosome: 8 146364022
chromosome: 9 141213431
chromosome: 10 135534747
chromosome: 11 135006516
chromosome: 12 133851895
chromosome: 13 115169878
chromosome: 14 107349540
chromosome: 15 102531392
chromosome: 16 90354753
chromosome: 17 81195210
chromosome: 18 78077248
chromosome: 19 59128983
chromosome: 20 63025520
chromosome: 21 48129895
chromosome: 22 51304566
chromosome: X 155270560
columns: readID chr1 pos1 chr2 pos2 strand1 strand2 phase0 phase1
. 1 724716 1 224200208 + + . . . 1 725830 1 224203159 + + . . . 1 726868 1 1491588 + + . . . 1 758626 1 794330 + + . . . 1 818997 1 1487155 + + . . . 1 843085 1 849002 + + . . . 1 847442 1 193311696 + + . .
maybe consider to add support for hickit pipeline( remove lines start with # and add support for chromosome like 1 instead of chr) but that's okay, it's not difficult to convert them.
Thanks!