c-zhou / polyGembler

GNU General Public License v3.0
15 stars 2 forks source link

java.lang.ArrayIndexOutOfBoundsException: 1 #10

Closed littletiger311 closed 2 years ago

littletiger311 commented 3 years ago

Dear Dr. Zhou ,

I got the error of "java.lang.ArrayIndexOutOfBoundsException: 1" when I used gembler or haplotyper with '-G' option. The error message is below: [INFO ] 2021-10-14 16:35:13.373 [main] Haplotyper - Random seed - 1831775920809742 dataprepare/populations.snps.recode.zip [INFO ] 2021-10-14 16:35:13.412 [main] Haplotyper - => STAGE I. training emission model with no transitions allowed. Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at cz1.hmm.model.EmissionModel.makeObUnits(EmissionModel.java:238) at cz1.hmm.model.EmissionModel.initialise(EmissionModel.java:193) at cz1.hmm.model.EmissionModel.(EmissionModel.java:92) at cz1.hmm.model.ModelTrainer.(ModelTrainer.java:26) at cz1.hmm.tools.Haplotyper.run(Haplotyper.java:249) at cz1.appl.PolyGembler.main(PolyGembler.java:50)

I' really appreciate it for any help on this.

The command running gembler is "java -jar dist/polyGembler-1.1-jar-with-dependencies.jar gembler -i populationout/populations.snps.vcf -l 10 -f 0.1 -m 0.5 -G -a scaf/PA.fasta -o PAoutGT -p 4 -parent Sample1:Sample2 -t 8"

The command running haplotype is "java -jar dist/polyGembler-1.1-jar-with-dependencies.jar haplotyper -i dataprepare/populations.snps.recode.zip -o haplo -G -c ctg000020 -ex test --parent Sample1:Sample2"

I also attached few lines of the vcf file below.

fileformat=VCFv4.2

fileDate=20211001

source="Stacks v2.59"

INFO=

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample46 Sample42 Sample2 Sample22 Sample44 Sample56 Sample19 Sample39 Sample57 Sample15 Sample8 Sample55 Sample21 Sample52 Sample36 Sample61 Sample24 Sample18 Sample60 Sample14 Sample5 Sample58 Sample4 Sample35 Sample10 Sample28 Sample20

    Sample23        Sample25        Sample11        Sample43        Sample59        Sample16        Sample50        Sample27        Sample53        Sample40        Sample12        Sample45        Sample7 Sample3 Sample49        Sample32        Sample26        Sample9 Sample30        Sample6 Sample37        Sample54        Sample29        Sample38        Sample51        Sample48        Sample34        Sample13        Sample62        Sample33        Sample17        Sample31        Sample47        Sample41

ctg000000 78710 48:85:+ C G . PASS NS=46;AF=0.304 GT:DP:AD:GQ:GL 0/0:57:57,0:40:0.00,-17.76,-244.33 1/1:119:0,119:40:-508.89,-36.03,-0.00 0/0:84:84,0:40:0.00,-25.89,-359.88 1/1:14:0,14:40:-59.53,-4.43,-0.00 0/0:34:34,0:40:-0.00,-10.84,-145.90 0/1:87:41,46:40:-170.07,0.00,-149.06 0/0:45:45,0:40:-0.00,-14.15,-192.97 0/0:29:29,0:40:-0.00,-9.33,-124.50 0/0:8:8,0:37:-0.00,-3.01,-34.63 0/0:15:15,0:40:-0.00,-5.12,-64.58 0/0:23:23,0:40:-0.00,-7.53,-98.82 0/0:38:38,0:40:-0.00,-12.04,-163.02 0/0:1:1,0:13:-0.05,-0.96,-4.72 1/1:26:0,26:40:-110.88,-8.04,-0.00 ./. 1/1:20:0,20:40:-85.20,-6.23,-0.00 0/0:22:22,0:40:-0.00,-7.23,-94.54 0/0:19:19,0:40:-0.00,-6.32,-81.70 1/1:19:1,18:22:-72.37,-1.66,-0.01 0/0:17:17,0:40:-0.00,-5.72,-73.14 1/1:23:0,23:40:-98.04,-7.14,-0.00 0/1:34:17,17:40:-61.92,0.00,-62.31 ./. 0/0:17:17,0:40:-0.00,-5.72,-73.14 0/0:7:7,0:33:-0.00,-2.71,-30.35 0/0:13:13,0:40:-0.00,-4.52,-56.02 0/0:9:9,0:40:-0.00,-3.31,-38.91 1/1:11:0,11:40:-46.69,-3.53,-0.00 1/1:11:0,11:40:-46.69,-3.53,-0.00 1/1:14:0,14:40:-59.53,-4.43,-0.00 0/0:9:9,0:40:-0.00,-3.31,-38.91 ./. 0/1:22:8,14:40:-52.69,0.00,-27.40 ./. ./. 0/0:7:7,0:33:-0.00,-2.71,-30.35 ./. 0/1:12:10,2:40:-4.34,-0.00,-38.97 0/0:8:8,0:37:-0.00,-3.01,-34.63 ./. ./. ./. ./. 0/0:8:8,0:37:-0.00,-3.01,-34.63 0/0:5:5,0:27:-0.00,-2.11,-21.79 0/1:19:7,12:40:-45.03,0.00,-24.02 0/0:8:8,0:37:-0.00,-3.01,-34.63 ./. 1/1:8:0,8:32:-33.85,-2.62,-0.00 ./. 0/0:1:1,0:13:-0.05,-0.96,-4.72 ./. 0/0:3:3,0:20:-0.01,-1.52,-13.24 0/1:8:7,1:18:-1.29,-0.02,-27.36 1/1:3:0,3:16:-12.48,-1.15,-0.03 0/0:10:10,0:40:-0.00,-3.61,-43.19 ./. 0/0:1:1,0:13:-0.05,-0.96,-4.72 0/0:3:3,0:20:-0.01,-1.52,-13.24 0/0:1:1,0:13:-0.05,-0.96,-4.72 ./. ./.

ctg000000 78711 48:86:+ A G . PASS NS=46;AF=0.304 GT:DP:AD:GQ:GL 0/0:57:57,0:40:0.00,-17.76,-244.33 1/1:119:0,119:40:-508.89,-36.03,-0.00 0/0:84:84,0:40:0.00,-25.89,-359.88 1/1:14:0,14:40:-59.53,-4.43,-0.00 0/0:34:34,0:40:-0.00,-10.84,-145.90 0/1:87:41,46:40:-170.07,0.00,-149.06 0/0:45:45,0:40:-0.00,-14.15,-192.97 0/0:29:29,0:40:-0.00,-9.33,-124.50 0/0:8:8,0:37:-0.00,-3.01,-34.63 0/0:15:15,0:40:-0.00,-5.12,-64.58 0/0:23:23,0:40:-0.00,-7.53,-98.82 0/0:38:38,0:40:-0.00,-12.04,-163.02 0/0:1:1,0:13:-0.05,-0.96,-4.72 1/1:26:0,26:40:-110.88,-8.04,-0.00 ./. 1/1:20:0,20:40:-85.20,-6.23,-0.00 0/0:22:22,0:40:-0.00,-7.23,-94.54 0/0:19:19,0:40:-0.00,-6.32,-81.70 1/1:19:1,18:22:-72.37,-1.66,-0.01 0/0:17:17,0:40:-0.00,-5.72,-73.14 1/1:23:0,23:40:-98.04,-7.14,-0.00 0/1:34:17,17:40:-61.92,0.00,-62.31 ./. 0/0:17:17,0:40:-0.00,-5.72,-73.14 0/0:7:7,0:33:-0.00,-2.71,-30.35 0/0:13:13,0:40:-0.00,-4.52,-56.02 0/0:9:9,0:40:-0.00,-3.31,-38.91 1/1:11:0,11:40:-46.69,-3.53,-0.00 1/1:11:0,11:40:-46.69,-3.53,-0.00 1/1:14:0,14:40:-59.53,-4.43,-0.00 0/0:9:9,0:40:-0.00,-3.31,-38.91 ./. 0/1:22:8,14:40:-52.69,0.00,-27.40 ./. ./. 0/0:7:7,0:33:-0.00,-2.71,-30.35 ./. 0/1:12:10,2:40:-4.34,-0.00,-38.97 0/0:8:8,0:37:-0.00,-3.01,-34.63 ./. ./. ./. ./. 0/0:8:8,0:37:-0.00,-3.01,-34.63 0/0:5:5,0:27:-0.00,-2.11,-21.79 0/1:19:7,12:40:-45.03,0.00,-24.02 0/0:8:8,0:37:-0.00,-3.01,-34.63 ./. 1/1:8:0,8:32:-33.85,-2.62,-0.00 ./. 0/0:1:1,0:13:-0.05,-0.96,-4.72 ./. 0/0:3:3,0:20:-0.01,-1.52,-13.24 0/1:8:7,1:18:-1.29,-0.02,-27.36 1/1:3:0,3:16:-12.48,-1.15,-0.03 0/0:10:10,0:40:-0.00,-3.61,-43.19 ./. 0/0:1:1,0:13:-0.05,-0.96,-4.72 0/0:3:3,0:20:-0.01,-1.52,-13.24 0/0:1:1,0:13:-0.05,-0.96,-4.72 ./. ./.

ctg000000 123700 84:94:- T C . PASS NS=43;AF=0.105 GT:DP:AD:GQ:GL 0/0:56:56,0:40:0.00,-17.32,-228.08 0/1:51:40,11:40:-28.65,0.00,-147.58 0/0:43:43,0:40:-0.00,-13.41,-175.53 ./. 0/0:13:13,0:40:-0.00,-4.38,-54.24 ./. 0/0:32:32,0:40:-0.00,-10.10,-131.05 0/0:15:15,0:40:-0.00,-4.98,-62.32 0/0:15:14,1:14:-0.05,-0.99,-54.28 0/0:23:23,0:40:-0.00,-7.39,-94.67 0/0:13:13,0:40:-0.00,-4.38,-54.24 0/0:26:26,0:40:-0.00,-8.29,-106.80 0/0:2:2,0:15:-0.04,-1.11,-9.80 0/1:14:8,6:40:-19.57,0.00,-29.34 ./. 0/1:7:3,4:40:-13.60,-0.00,-11.23 0/0:7:7,0:32:-0.00,-2.58,-29.98 0/0:4:4,0:22:-0.01,-1.68,-17.86 ./. 0/0:21:21,0:40:-0.00,-6.79,-86.58 0/1:7:4,3:40:-9.55,-0.00,-15.28 0/1:9:7,2:40:-4.91,-0.00,-26.80 0/0:4:4,0:22:-0.01,-1.68,-17.86 0/0:6:6,0:29:-0.00,-2.28,-25.94 0/0:12:12,0:40:-0.00,-4.08,-50.19 0/0:5:5,0:25:-0.00,-1.98,-21.90 0/0:12:12,0:40:-0.00,-4.08,-50.19 0/1:4:2,2:40:-6.41,-0.00,-8.09 0/0:3:3,0:19:-0.02,-1.39,-13.83 0/1:3:1,2:40:-6.71,-0.00,-4.35 0/0:5:5,0:25:-0.00,-1.98,-21.90 ./. 0/0:6:6,0:29:-0.00,-2.28,-25.94 0/0:3:3,0:19:-0.02,-1.39,-13.83 ./. 0/0:2:2,0:15:-0.04,-1.11,-9.80 ./. 0/1:6:4,2:40:-5.81,-0.00,-15.58 0/0:3:3,0:19:-0.02,-1.39,-13.83 0/0:6:6,0:29:-0.00,-2.28,-25.94 ./. ./. ./. ./. 0/0:4:4,0:22:-0.01,-1.68,-17.86 0/1:11:8,3:40:-8.35,-0.00,-30.24 0/0:10:10,0:40:-0.00,-3.48,-42.11 ./. ./. ./. 0/0:4:4,0:22:-0.01,-1.68,-17.86 ./. 0/0:2:2,0:15:-0.04,-1.11,-9.80 0/0:4:4,0:22:-0.01,-1.68,-17.86 ./. 0/0:3:3,0:19:-0.02,-1.39,-13.83 ./. ./. 0/0:5:5,0:25:-0.00,-1.98,-21.90 0/0:2:2,0:15:-0.04,-1.11,-9.80 ./. 0/0:2:2,0:15:-0.04,-1.11,-9.80

c-zhou commented 3 years ago

Hello littletiger311,

The genotypes of the variants in your VCF file are diploids. So you need to change the parameter "-p 4" to "-p 2".

Is this still the Z. japonica data? If yes, it is an allotetraploid, so it is fine to run it in diploid mode, and that was what we did in the paper. If you want to run it in tetraploid mode, you need to recall the genotype as tetraploid with allele depth. One option is the R package "updog" by Gerard et. al.

Best, Chenxi

littletiger311 commented 3 years ago

Dear Dr. Zhou , Thank you for your timely reply. Yes, it is still the allotetraploid Z.japonica, so I changed ploidy to diploid (-p 2). The Array Index Out Of Bounds error persists (see below). Is there anything wrong with my vcf file, which was produced by Stacks 2.59.

The running log also said "[WARN ] 2021-10-15 11:25:18.536 [main] Executor - No DP field in VCF file. Filtering by SNP allele depth disabled.". I don't quite understand it well, as the vcf file provides "GT:DP:AD:GQ:GL" for every sample.

Thank you for your time and instruction.

[INFO ] 2021-10-15 11:25:18.531 [main] Executor - STEP 01 prepare data [WARN ] 2021-10-15 11:25:18.536 [main] Executor - No DP field in VCF file. Filtering by SNP allele depth disabled. [INFO ] 2021-10-15 11:25:23.408 [main] Executor - #Filtered by Multi-allelic: 0 [INFO ] 2021-10-15 11:25:23.408 [main] Executor - #Filtered by Quality : 0 [INFO ] 2021-10-15 11:25:23.408 [main] Executor - #Filtered by MAF : 4344 [INFO ] 2021-10-15 11:25:23.408 [main] Executor - #Filtered by Allele Depth : 0 [INFO ] 2021-10-15 11:25:23.409 [main] Executor - #Filtered by Missing : 0 [INFO ] 2021-10-15 11:25:23.409 [main] Executor - --------------------------- [INFO ] 2021-10-15 11:25:23.409 [main] Executor - #Filtered Total : 4344 [INFO ] 2021-10-15 11:25:30.648 [main] Executor - STEP 02 infer single-point haplotypes [INFO ] 2021-10-15 11:25:30.670 [pool-2-thread-1] Haplotyper - Random seed - 1899593212805299 [INFO ] 2021-10-15 11:25:30.670 [pool-2-thread-4] Haplotyper - Random seed - 1899593212805299 [INFO ] 2021-10-15 11:25:30.670 [pool-2-thread-7] Haplotyper - Random seed - 1899593212805299 [INFO ] 2021-10-15 11:25:30.670 [pool-2-thread-2] Haplotyper - Random seed - 1899593212805299 [INFO ] 2021-10-15 11:25:30.670 [pool-2-thread-6] Haplotyper - Random seed - 1899593212805299 LYoutGT3/out1.zip LYoutGT3/out1.zip LYoutGT3/out1.zip LYoutGT3/out1.zip [INFO ] 2021-10-15 11:25:30.670 [pool-2-thread-3] Haplotyper - Random seed - 1899593212805299 LYoutGT3/out1.zip LYoutGT3/out1.zip [INFO ] 2021-10-15 11:25:30.670 [pool-2-thread-5] Haplotyper - Random seed - 1899593212805299 [INFO ] 2021-10-15 11:25:30.670 [pool-2-thread-8] Haplotyper - Random seed - 1899593212805299 LYoutGT3/out1.zip LYoutGT3/out1.zip [INFO ] 2021-10-15 11:25:30.828 [pool-2-thread-7] Haplotyper - => STAGE I. training emission model with no transitions allowed. [INFO ] 2021-10-15 11:25:30.829 [pool-2-thread-6] Haplotyper - => STAGE I. training emission model with no transitions allowed. [INFO ] 2021-10-15 11:25:30.831 [pool-2-thread-3] Haplotyper - => STAGE I. training emission model with no transitions allowed. [INFO ] 2021-10-15 11:25:30.834 [pool-2-thread-2] Haplotyper - => STAGE I. training emission model with no transitions allowed. [INFO ] 2021-10-15 11:25:30.837 [pool-2-thread-1] Haplotyper - => STAGE I. training emission model with no transitions allowed. Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-2" Exception in thread "pool-2-thread-3" Exception in thread "pool-2-thread-6" Exception in thread "pool-2-thread-1" java.lang.ArrayIndexOutOfBoundsException: 1 [INFO ] 2021-10-15 11:25:30.838 [pool-2-thread-5] Haplotyper - => STAGE I. training emission model with no transitions allowed. at cz1.hmm.model.EmissionModel.makeObUnits(EmissionModel.java:238)

c-zhou commented 3 years ago

Hello littletiger311,

The warning message "No DP field in VCF file" is because it expects a "DP" field in the INFO column. It is fine - just skipped the filtering by total allele depth.

For the error, I am sure what went wrong. I am happy to have a check if you could share the output files - either in this thread or send me by email at chnx.zhou@gmail.com.

Chenxi

littletiger311 commented 3 years ago

Dr. Chen, I have sent you the files by email. Thank you very much for your help. LT