aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
415 stars 183 forks source link

pre does not support multithreaded processing of 4DN-DCIC pairs file #170

Closed dmalzl closed 4 years ago

dmalzl commented 4 years ago

Are you sure this is a bug? I consulted the forum and after telling me this was fixed with the new version and the entry got deleted. However, I am already using the newest version and this is definitely not fixed.

Describe the bug I am trying to use juicer pre to generate a hic file from 4DN-DCIC pairs format. This works fine when using a single thread. However, when trying to speed it up using multiple threads pre crashes telling me it could not find files it expected to be there. In particular, I was giving it the whole pairs file with the name somepairs.pairs.gz and it tried to open somepairs.pairs.gz_chrN_chrN which of course did not exist. I then tried to generate them manually and see if it works this way. Unfortunately, pre now skips all entries and crashes again telling me there are no pairs to process. All pairs files have the header included as described here.

To Reproduce Steps to reproduce the behavior: With complete file:

java -jar juicer_tools_1.22.01.jar pre \
       -r 5000,10000,25000,50000,100000,250000,500000,1000000 \
       -k KR,GW_KR \
       -j 4 / --threads 4 \ #happens with both options
       pairs/somepairs.pairs.gz \
       somepairs.hic \
       mm9

image

With split pairs:

java -jar juicer_tools_1.22.01.jar pre \
       -r 5000,10000,25000,50000,100000,250000,500000,1000000 \
       -k KR,GW_KR \
       -j 4 / --threads 4 \ #happens with both options
       pairs/somepairs.pairs.gz_* \ 
       somepairs.hic \
       mm9

image

I also includes the files used in these examples as a separate zip-archive. pairs.zip

Expected behavior I expected pre to simply take the whole file and process it with multiple threads.

Desktop (please complete the following information):

sa501428 commented 4 years ago

Hey @dmalzl , The usage here is incorrect. As I had mentioned, you can use the default pre usage and the multithreading is built-in. i.e. use the command java -jar ../juicer_tools_1.22.01.jar pre -r 5000,10000,25000,50000,100000,250000,500000,1000000 -k KR,GW_KR --threads 4 somepairs.pairs.gz somepairs.hic mm9 It only needs the original gzipped pairs file.

sa501428 commented 4 years ago

Lmk if this works

dmalzl commented 4 years ago

That's my problem. This was the first thing I tried (see expected behaviour and first example) but it doesn't work saying it can't open pairs/somepairs.pairs.gz_chrN_chrN where N is 1, 2, 3, ..., 19, X, Y, M.

sa501428 commented 4 years ago

Can you try re-downloading the jar? Here's what I get when I run the command

java -jar juicer_tools_1.22.01.jar pre -j 4 test.txt test.hic hg19
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2020-06-09T06:37:00,013]  [Globals.java:138] [main]  Development mode is enabled
Using 4 CPU thread(s)
Not including fragment map
Start preprocess
Writing header
Writing body
......................................................................................................................................................................................................................................................................................................................................
Writing footer

Finished preprocess

Calculating norms for zoom BP_2500000
Calculating norms for zoom BP_1000000
Calculating norms for zoom BP_500000
Calculating norms for zoom BP_250000
Calculating norms for zoom BP_100000
Calculating norms for zoom BP_50000
Calculating norms for zoom BP_25000
Calculating norms for zoom BP_10000
Calculating norms for zoom BP_5000
Calculating norms for zoom BP_1000
Finished writing norms
sa501428 commented 4 years ago

Sorry, this is what I get when I ran your command:

java -jar ../juicer_tools_1.22.01.jar pre -r 5000,10000,25000,50000,100000,250000,500000,1000000 -k KR,GW_KR --threads 4 somepairs.pairs.gz somepairs.hic mm9
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARN [2020-06-09T06:41:29,569]  [Globals.java:138] [main]  Development mode is enabled
Using 4 CPU thread(s)
Not including fragment map
Start preprocess
Writing header
Writing body
...........................................................................................................................................................................................................................................
Writing footer

Finished preprocess

Calculating norms for zoom BP_1000000

Calculating norms for zoom BP_500000

Calculating norms for zoom BP_250000

Calculating norms for zoom BP_100000

Calculating norms for zoom BP_50000

Calculating norms for zoom BP_25000

Calculating norms for zoom BP_10000

Calculating norms for zoom BP_5000
Finished writing norms
dmalzl commented 4 years ago

Okay, so there has been an update since I last downloaded it. It works now. Thanks for having a look at this.