aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
402 stars 180 forks source link

CPU mega.sh issues (quotation in sort / merge-stats / cryptic message) #262

Closed rdacemel closed 2 years ago

rdacemel commented 2 years ago

Are you sure this is an issue? Pretty much sure.

I've been playing around with new Juicer/mega and I think overall is a big improvement! I really like how intermediate files are managed now. However, some stuff required some fixing at least in my system.

  1. First off, merge-stats and sorting didn't work in my system because the variables ${merged_names} and ${inter_names} were preceded by quotes. This quotes propagated to the bash call and the list of filenames were not recognized as different files. Removing the quotes from mega fixed it for me (lines 198, 223 for instance):

java -Xmx2g -jar "${juiceDir}"/scripts/common/merge-stats.jar "$outputDir"/inter "${inter_names}" for java -Xmx2g -jar "${juiceDir}"/scripts/common/merge-stats.jar "$outputDir"/inter ${inter_names}

sort --parallel=40 -T "${tmpdir}" -m -k2,2d -k6,6d "${merged_names}" > "${outputDir}"/merged1.txt for sort --parallel=40 -T "${tmpdir}" -m -k2,2d -k6,6d ${merged_names} > "${outputDir}"/merged1.txt

  1. Stats are now calculated, but I get this message:

java.lang.NumberFormatException: For input string: "50% - 50%" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Long.parseLong(Long.java:692) at java.base/java.lang.Long.parseLong(Long.java:817) at sfh.merger.StatsMerger.processLine(StatsMerger.java:41) at sfh.merger.StatsMerger.parse(StatsMerger.java:16) at sfh.StatsUtils.merge(StatsUtils.java:10) at sfh.MergeStats.main(MergeStats.java:50)

  1. I also get this message:

    Unable to find All-All or All-All

Are those two last messages expected, or should I worry? Best, Rafa.

sa501428 commented 2 years ago

Hi @rdacemel !

I've pushed a commit that hopefully resolves the first bug - please let me know if I missed any other lines that didn't work on your system.

Regarding the 50% 50% parse error - how were your individual libraries processed? Typically we have a breakdown of pair types which give a breakdown closer to 25%-25%-25%-25% not 50%-50%. It didn't find that expected breakdown, hence the warning.

The missing All by All can be ignored / I thought we had suppressed that printout. Which juicer_tools.jar version are you using?

Both of those messages are warnings, but the files should be fine.

Thanks so much for catching the bug and the feedback!

rdacemel commented 2 years ago

Hi! That was fast ;)

Hmm, I processed them with juicer with an early exit to run mega afterwards. Not sure what you refer to with breakdown of pairs but this is an example of an individual stats file in case it is helpful.

Read type: Paired End Sequenced Read Pairs: 15058574 No chimera found: 402463 (2.67%) One or both reads unmapped: 402463 (2.67%) 2 alignments: 14245238 (94.60%) 2 alignments (A...B): 13327908 (88.51%) 2 alignments (A1...A2B; A1B2...B1A2): 917330 (6.09%) 3 or more alignments: 410873 (2.73%) Ligation Motif Present: N/A Average insert size: 309.70 Total Unique: 14071075 (98.78%, 93.44%) Total Duplicates: 174163 (1.22%, 1.16%) Library Complexity Estimate*: 577,819,105 Intra-fragment Reads: N/A Below MAPQ Threshold: 3,518,125 (23.36% / 25.00%) Hi-C Contacts: 10,552,950 (70.08% / 75.00%) 3' Bias (Long Range): 50% - 50% Pair Type %(L-I-O-R): 25% - 25% - 25% - 25% L-I-O-R Convergence: 1233 Inter-chromosomal: 3,396,090 (22.55% / 24.14%) Intra-chromosomal: 7,156,860 (47.53% / 50.86%) Short Range (<20Kb): <500BP: 1,064,865 (7.07% / 7.57%) 500BP-5kB: 426,444 (2.83% / 3.03%) 5kB-20kB: 600,444 (3.99% / 4.27%) Long Range (>20Kb): 5,065,107 (33.64% / 36.00%)

I downloaded the last jar that I saw available (2.13.07),

sa501428 commented 2 years ago

Ah, thanks for sharing this! Let me investigate the merge-script and investigate this further. Did it still manage to build a merged inter.txt/inter_30.txt or were those missing in the created mega folder?

Also apologies, can you resubmit your request to join the Google Group? We've been having trouble filtering real vs fake users / if the reason field is empty, account requests are assumed to be spam.

rdacemel commented 2 years ago

It was indeed created:

Read type: Paired End Sequenced Read Pairs: 63338629 No chimera found: 1676126 (2.65%) One or both reads unmapped: 1676126 (2.65%) 2 alignments: 59935373 (94.63%) 2 alignments (A...B): 56054582 (88.50%) 2 alignments (A1...A2B; A1B2...B1A2): 3880791 (6.13%) 3 or more alignments: 1727130 (2.73%) Total Unique: 59191165 (93.45% / 98.76%) Total Duplicates: 744208 (1.17% / 1.24%) Below MAPQ Threshold: 20158790 (31.83% / 34.06%) Hi-C Contacts: 39032375 (61.62% / 65.94%) Pair Type %(L-I-O-R): 25% - 25% - 25% - 25% L-I-O-R Convergence: 1520 Inter-chromosomal: 11717002 (18.50% / 19.80%) Intra-chromosomal: 27315373 (43.13% / 46.15%) Short Range (<20Kb): <500BP: 4141117 (6.54% / 7.00%) 500BP-5kB: 1624538 (2.56% / 2.74%) 5kB-20kB: 2286728 (3.61% / 3.86%) Long Range (>20Kb): 19262990 (30.41% / 32.54%)

I will resubmit yes, I read the note about the bots just after clicking... No worries at all.

pna059 commented 2 years ago

Hi there, I am getting the same error using mega>

java.lang.NumberFormatException: For input string: "65% - 35%"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at sfh.merger.StatsMerger.processLine(StatsMerger.java:41)
        at sfh.merger.StatsMerger.parse(StatsMerger.java:16)
        at sfh.StatsUtils.merge(StatsUtils.java:10)
        at sfh.MergeStats.main(MergeStats.java:50)
java.lang.NumberFormatException: For input string: "60% - 40%"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at sfh.merger.StatsMerger.processLine(StatsMerger.java:41)
        at sfh.merger.StatsMerger.parse(StatsMerger.java:16)
        at sfh.StatsUtils.merge(StatsUtils.java:10)
        at sfh.MergeStats.main(MergeStats.java:50)
java.lang.NumberFormatException: For input string: "64% - 36%"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at sfh.merger.StatsMerger.processLine(StatsMerger.java:41)
        at sfh.merger.StatsMerger.parse(StatsMerger.java:16)
        at sfh.StatsUtils.merge(StatsUtils.java:10)
        at sfh.MergeStats.main(MergeStats.java:50)
java.lang.NumberFormatException: For input string: "60% - 40%"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at sfh.merger.StatsMerger.processLine(StatsMerger.java:41)
        at sfh.merger.StatsMerger.parse(StatsMerger.java:16)
        at sfh.StatsUtils.merge(StatsUtils.java:10)
        at sfh.MergeStats.main(MergeStats.java:50)
(-: Finished creating top stats files.
sort: extra operand '/storage/brno2/home/pavlan/Barley_leaf_HiC/Barley_leaf_HiCrep1/aligned/merged1.txt.gz)' not allowed with -c

....the process stays in S status and does not seem to progress.

My replicates were processed with an older juicer release, so I had to rename the merged_nodups.txt to merged1.txt

kellyliyichen commented 2 years ago

Hi,

I am also getting the same message when running mega.sh

java.lang.NumberFormatException: For input string: "50% - 50%" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Long.parseLong(Long.java:692) at java.base/java.lang.Long.parseLong(Long.java:817) at sfh.merger.StatsMerger.processLine(StatsMerger.java:41) at sfh.merger.StatsMerger.parse(StatsMerger.java:16) at sfh.StatsUtils.merge(StatsUtils.java:10) at sfh.MergeStats.main(MergeStats.java:50) (-: Finished creating top stats files. (-: Finished sorting all files into a single merge.

Will it affect the following steps to sort individual merged1.txt or merged30.txt and to create the combined .hic file?

Many thanks!!

sa501428 commented 2 years ago

The .hic file should still build. The 3' bias line may be missing in the stats. We will work on a fix for this. But the overall .hic file should work without issue.