aidenlab / Juicebox

Visualization and analysis software for Hi-C data -
https://aidenlab.org/juicebox
MIT License
239 stars 57 forks source link

[BUG] juicer_tools pre throws NumberFormatException for assembly size 2147483648 or larger (2**31) #1031

Closed jbh-cas closed 4 months ago

jbh-cas commented 4 months ago

I tried the pre command with _juicer_tools1.19.02.jar, _juicertools.2.04.06.jar, _juicertools.2.18.00.jar, and juicertools.2.20.00.jar with a genome size that is a little bigger than 2**31 at 2192568369 and they all throw the java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) error.

I thought issue #129 which is the same as this had been fixed, and the 1.19.02 release comment makes it seem this was addressed then, but it is not seeming so.

This did not work: java -Xmx36G -jar juicer_tools.2.20.00.jar pre for_JBAT.txt for_JBAT.hic <(echo "assembly 2147483648")

This did: java -Xmx36G -jar juicer_tools.2.20.00.jar pre for_JBAT.txt for_JBAT.hic <(echo "assembly 2147483647")

Difference in test is the first number 2147483648 is 1 greater than 2147483647, this second number being 2**31 -1

I don't know if there is a chance to address this by making the var a long but if possible it would be helpful.

Thanks, Jim Henderson

jbh-cas commented 4 months ago

Interpreting this input as unsigned int gives a max of 4.294 billion instead of 2.147 and would help a lot. For the particular line throwing the error, this would mean using Integer.parseUnsignedInt instead of parsedInt. Storage of the number is 4 bytes just like the signed version.

jbh-cas commented 4 months ago

I see the work around for this is to use the scale factor in JuiceBox and divide the value appropriately for juicer_tools pre to have it under 2**31.

Closing this

yongjiam commented 4 months ago

Hi jbh-cas, I am experiencing a similar issue with a barley genome assembly size at 4271963214. Would you please elaborate a little bit how to work around this? thank you very much.

asm_size=$(awk '{s+=$2} END{print s}' contigs.fa.fai) java -Xmx36G -jar /scratch/pawsey0399/yjia/WBT/juicer_tools_1.22.01.jar pre out_JBAT.txt out_JBAT.hic <(echo "assembly ${asm_size}")

Command exit status: 56

Command output: WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2024-05-09T02:10:26,821] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Not including fragment map Start preprocess Writing header Writing body

Command error: [I::main_pre] make juicer pre input from BIN file yahs.out.bin [I::make_juicer_pre_file_from_bin] 385879578 read pairs processed [I::main_pre] genome size: 4271963213 [I::main_pre] scale factor: 2 [I::main_pre] chromosome sizes for juicer_tools pre - PRE_C_SIZE: assembly 2135981606 [I::main_pre] JUICER_PRE CMD: java -Xmx36G -jar ${juicer_tools} pre out_JBAT.txt out_JBAT.hic <(echo "assembly 2135981606") [I::main_pre] Version: 1.1 [I::main_pre] CMD: juicer pre -a -o out_JBAT yahs.out.bin yahs.out_scaffolds_final.agp contigs.fa.fai [I::main_pre] Real time: 96.795 sec; CPU: 94.898 sec; Peak RSS: 0.003 GB WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2024-05-09T02:10:26,821] [Globals.java:138] [main] Development mode is enabled java.lang.NumberFormatException: For input string: "4.27196e+09" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:652) at java.base/java.lang.Integer.parseInt(Integer.java:770) at juicebox.data.HiCFileTools.loadChromosomes(HiCFileTools.java:157) at juicebox.tools.clt.old.PreProcessing.readArguments(PreProcessing.java:93) at juicebox.tools.HiCTools.main(HiCTools.java:93)

jbh-cas commented 3 months ago

Need to divide by factor of 2 that gives value under 2**31 and then use this factor in Set Scale in JBAT.

Your value just makes the scaling factor of 2 since 271963214 / 2 = 2135981607 and 2,135,981,607 < 2,147,483,648 which is 2**31 Set Scale to 2.0 in the JBAT Assembly menu, changing it from its default of 1.0

On 05/08/2024 8:48 PM PDT yongjiam @.***> wrote:

Hi jbh-cas, I am experiencing a similar issue with a barley genome assembly size at 4271963214. Would you please elaborate a little bit how to work around this? thank you very much.

asm_size=$(awk '{s+=$2} END{print s}' contigs.fa.fai) java -Xmx36G -jar /scratch/pawsey0399/yjia/WBT/juicer_tools_1.22.01.jar pre out_JBAT.txt out_JBAT.hic <(echo "assembly ${asm_size}")

Command exit status: 56

Command output: WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2024-05-09T02:10:26,821] [Globals.java:138] [main] Development mode is enabled Using 1 CPU thread(s) Not including fragment map Start preprocess Writing header Writing body

Command error: [I::main_pre] make juicer pre input from BIN file yahs.out.bin [I::make_juicer_pre_file_from_bin] 385879578 read pairs processed [I::main_pre] genome size: 4271963213 [I::main_pre] scale factor: 2 [I::main_pre] chromosome sizes for juicer_tools pre - PRE_C_SIZE: assembly 2135981606 [I::main_pre] JUICER_PRE CMD: java -Xmx36G -jar ${juicer_tools} pre out_JBAT.txt out_JBAT.hic <(echo "assembly 2135981606") [I::main_pre] Version: 1.1 [I::main_pre] CMD: juicer pre -a -o out_JBAT yahs.out.bin yahs.out_scaffolds_final.agp contigs.fa.fai [I::main_pre] Real time: 96.795 sec; CPU: 94.898 sec; Peak RSS: 0.003 GB WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARN [2024-05-09T02:10:26,821] [Globals.java:138] [main] Development mode is enabled java.lang.NumberFormatException: For input string: "4.27196e+09" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:652) at java.base/java.lang.Integer.parseInt(Integer.java:770) at juicebox.data.HiCFileTools.loadChromosomes(HiCFileTools.java:157) at juicebox.tools.clt.old.PreProcessing.readArguments(PreProcessing.java:93) at juicebox.tools.HiCTools.main(HiCTools.java:93)

— Reply to this email directly, view it on GitHub https://github.com/aidenlab/Juicebox/issues/1031#issuecomment-2101880423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELSO4LV4XXVKLS7IOVECDTZBLWXRAVCNFSM6AAAAABGRZSC22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRHA4DANBSGM. You are receiving this because you modified the open/close state.Message ID: @.***>

dudcha commented 3 months ago

Please use 3D-DNA visualize assembly for this to be handled automatically on the input/fasta output: https://github.com/aidenlab/3d-dna/blob/phasing/visualize/run-assembly-visualizer.sh. When using JBAT please cite appropriately https://www.biorxiv.org/content/10.1101/254797v1.