LiuLabUB / HMMRATAC

HMMRATAC peak caller for ATAC-seq data
GNU General Public License v3.0
98 stars 23 forks source link

BR: java.lang.NumberFormatException #52

Open MikeWLloyd opened 4 years ago

MikeWLloyd commented 4 years ago

I am running HMMRATAC Version 1.2.10-0 from conda installed into singularity image with conda 3 base. Trying to call peaks on mm10 based data.

Here is the command that is being run:

singularity run hmmratac.sif java -Xms512m -Xmx10g -jar /opt/conda/share/hmmratac-1.2.10-0/HMMRATAC.jar -b input.bam -i input.bam.bai -g chrom2.info -o testing -e mm10-blacklist.v2.bed --window 1250000 --score all

Run proceeds to step where testing_peaks.gappedPeak and testing_summits.bed are generated. The following error is then reported and run ends with both gappedPeak and summits files empty:

Exception in thread "main" java.lang.NumberFormatException: For input string: "1.0_0.740740740740741_0.0_0.04420339549616825_1.4173579467268351"
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
    at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.lang.Double.parseDouble(Double.java:538)
    at HMMR_ATAC.Main_HMMR_Driver.main(Main_HMMR_Driver.java:660)

Based on the error, I suspect that --score all is to blame

Run Log:

Fragment Expectation Maximum Done
Mean    50.0    StdDevs 20.0
Mean    187.2568319263678   StdDevs 52.04863123120787
Mean    376.3211422102428   StdDevs 48.231939080159705
Mean    599.1807715834468   StdDevs 96.32246076484671
ScalingFactor   1.087837
Training Regions found and Zscore regions for exclusion found
Training Fragment Pileup completed
Kmeans Model:
HMM with 3 state(s)

State 0
  Pi: 0.3333333333333333
  Aij: 0.333 0.333 0.333
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.016 0.017 0.153 0.133 ]

State 1
  Pi: 0.3333333333333333
  Aij: 0.333 0.333 0.333
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.455 0.839 0.284 0.152 ]

State 2
  Pi: 0.3333333333333333
  Aij: 0.333 0.333 0.333
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.736 1.409 1.274 0.948 ]

Model created and refined. See testing.model
Model:
HMM with 3 state(s)

State 0
  Pi: 0.3333333333333333
  Aij: 0.967 0.028 0.005
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0 0 0 0 ]

State 1
  Pi: 0.3333333333333333
  Aij: 0.05 0.941 0.009
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.443 0.81 0.196 0.027 ]

State 2
  Pi: 0.3333333333333333
  Aij: 0.008 0.007 0.984
  Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.342 0.626 0.822 0.692 ]

Genome split and subtracted masked regions
50 round viterbi done
100 round viterbi done
150 round viterbi done
200 round viterbi done
250 round viterbi done
300 round viterbi done
349 round viterbi done
MikeWLloyd commented 4 years ago

I re-ran without --score all and the run completed without issue.

Can you advise, is this a bug or did I enter the command incorrectly?

EvanTarbell commented 4 years ago

This is a bug. I recently added a function to filter the peaks based on --threshold, but the --score all option creates a string with all scoring systems that are available (max, mean, fold-change etc). The problem is that the filter wont work with the string. I'll fix that ASAP and re-release it. In that case, you wont be able to filter the peaks, instead ALL of them will be reported and if filtering is needed, youll have to do it afterwards. this will only apply to --score all. In the meantime, if you choose a different scoring option, there will be no problem.

dimurali93 commented 3 years ago

is this fixed?