bioinformatics-centre / BayesTyper

A method for variant graph genotyping based on exact alignment of k-mers
86 stars 7 forks source link

Genotyping error on one cluster #20

Closed jjfarrell closed 4 years ago

jjfarrell commented 4 years ago

When running the bayestyper genotype on the 9 clusters, the 7th cluster generates an error. Any suggestions? This ran suscessfully before I dropped -ci1 from the kmc command line to minimize storage requirements.

seq 1 $N_UNITS|xargs -I {} -n1 $BAYESTYPER/bin/bayesTyper genotype -v bayestyper_unit_{}/variant_clusters.bin -c bayestyper_cluster_data -s sample.
tsv  -g $CANON -d $DECOY -o bayestyper_unit_{}/bayestyper -z -p $NSLOTS

This is the error

bayesTyper: /isdata/kroghgrp/jasi/bayesTyper/code/releases/v1.5_static/BayesTyper-1.5/src/bayesTyper/GenotypeWriter.cpp:415: uint GenotypeWriter::f
inalise(const string&, const Chromosomes&, const string&, const OptionsContainer&, const Filters&): Assertion `getline(tmp_infile_fstream, genotype
d_variants_it->second.back().genotypes, '\n')' failed.

This is the full output.

[29/12/2019 20:45:39] You are using BayesTyper (v1.5)

[29/12/2019 20:45:39] Seeding pseudo-random number generator with 1577670339 ...
[29/12/2019 20:45:39] Setting the kmer size to 55 ...

[29/12/2019 20:45:39] Parsed information for 1 sample(s)

[29/12/2019 20:45:39] Parsing reference genome ...
[29/12/2019 20:45:47] Parsed 65 reference genome chromosomes(s) (3095211400 nucleotides)

[29/12/2019 20:45:47] Parsing decoy sequence(s) ...
[29/12/2019 20:45:47] Parsed 2515 decoy sequence(s) (10503663 nucleotides)

[29/12/2019 20:45:54] Maximum resident set size: 3.28256 Gb

[29/12/2019 20:45:54] Parsing variant clusters ...
[29/12/2019 20:46:28] Parsed 2004766 variant clusters (5509182 variants)

[29/12/2019 20:46:39] Parsing parameter kmers ...
[29/12/2019 20:46:42] Parsed 1000000 kmers

[29/12/2019 20:46:42] Maximum resident set size: 27.0824 Gb

[29/12/2019 20:46:42] Counting kmers in variant cluster paths ...
[29/12/2019 20:54:17] Counting kmers in inter-cluster regions and decoy sequence(s) ...

[29/12/2019 20:56:38] Parsing KMC table containing 3141728545 kmers for sample A-ADC-AD008276-BL-NCR-13AD63452 ...

[29/12/2019 21:04:25] Classifying kmers in variant cluster paths ...
[29/12/2019 21:06:16] Out of 231207366 kmers:

        - 187516823 have a match to a single variant cluster
        - 30409265 have a match to single variant cluster group and multiple variant clusters

        - 224078 have match to at least one variant cluster and has match to a decoy sequence (not used for inference)
        - 20713 have match to at least one variant cluster and has a maximum haploid multiplicity higher than 127 (not used for inference)
        - 11835346 have matches to multiple variant cluster groups within or across inference units (not used for inference)

        - 1201141 have no match to a variant cluster (includes parameter kmers)

[29/12/2019 21:06:16] Maximum resident set size: 28.9024 Gb

[29/12/2019 21:06:16] Estimating genomic haploid kmer count distribution(s) from parameter kmers ...

[29/12/2019 21:06:16] Estimated negative binomial (mean = 12.5918, var = 25.1094) for sample A-ADC-AD008276-BL-NCR-13AD63452 using 882855 parameter
 kmers (multiplicity = 2)

[29/12/2019 21:06:16] Wrote genomic parameters to bayestyper_unit_7/bayestyper_genomic_parameters.txt

[29/12/2019 21:06:16] Maximum resident set size: 28.9024 Gb

[29/12/2019 21:06:16] Estimating noise model parameters using 20 independent gibbs sampling chains each with 350 iterations (100 burn-in) ...
[29/12/2019 21:11:45] Calculated final noise model parameters by averaging 5000 parameter estimates (250 per gibbs sampling chain)

[29/12/2019 21:11:45] Wrote noise parameters to bayestyper_unit_7/bayestyper_noise_parameters.txt

[29/12/2019 21:11:45] Maximum resident set size: 29.648 Gb

[29/12/2019 21:11:45] Estimating genotypes using 20 independent gibbs sampling chains each with 350 iterations (100 burn-in) ...

[29/12/2019 21:15:27] Genotyped 100000 variants
[29/12/2019 21:17:38] Genotyped 200000 variants
[29/12/2019 21:19:48] Genotyped 300000 variants
[29/12/2019 21:21:55] Genotyped 400000 variants
[29/12/2019 21:23:55] Genotyped 500000 variants
[29/12/2019 21:25:51] Genotyped 600000 variants
[29/12/2019 21:27:28] Genotyped 700000 variants
[29/12/2019 21:28:58] Genotyped 800000 variants
[29/12/2019 21:30:15] Genotyped 900000 variants
[29/12/2019 21:33:04] Genotyped 1000000 variants
[29/12/2019 21:34:05] Genotyped 1100000 variants
bayesTyper: /isdata/kroghgrp/jasi/bayesTyper/code/releases/v1.5_static/BayesTyper-1.5/src/bayesTyper/GenotypeWriter.cpp:415: uint GenotypeWriter::f
inalise(const string&, const Chromosomes&, const string&, const OptionsContainer&, const Filters&): Assertion `getline(tmp_infile_fstream, genotype
d_variants_it->second.back().genotypes, '\n')' failed.
jjfarrell commented 4 years ago

I reran genotyping the 7th cluster and this time it worked. It must be some transient issue.

[30/12/2019 16:36:30] You are using BayesTyper (v1.5)

[30/12/2019 16:36:30] Seeding pseudo-random number generator with 1577741790 ...
[30/12/2019 16:36:30] Setting the kmer size to 55 ...

[30/12/2019 16:36:30] Parsed information for 1 sample(s)

[30/12/2019 16:36:30] Parsing reference genome ...
[30/12/2019 16:36:56] Parsed 65 reference genome chromosomes(s) (3095211400 nucleotides)

[30/12/2019 16:36:56] Parsing decoy sequence(s) ...
[30/12/2019 16:36:56] Parsed 2515 decoy sequence(s) (10503663 nucleotides)

[30/12/2019 16:37:07] Maximum resident set size: 3.28256 Gb

[30/12/2019 16:37:07] Parsing variant clusters ...
[30/12/2019 16:37:54] Parsed 2004766 variant clusters (5509182 variants)

[30/12/2019 16:38:05] Parsing parameter kmers ...
[30/12/2019 16:38:09] Parsed 1000000 kmers

[30/12/2019 16:38:09] Maximum resident set size: 27.0824 Gb

[30/12/2019 16:38:09] Counting kmers in variant cluster paths ...
[30/12/2019 16:49:41] Counting kmers in inter-cluster regions and decoy sequence(s) ...

[30/12/2019 16:52:48] Parsing KMC table containing 3141728545 kmers for sample A-ADC-AD008276-BL-NCR-13AD63452 ...

[30/12/2019 17:02:18] Classifying kmers in variant cluster paths ...
[30/12/2019 17:04:33] Out of 231207366 kmers:

        - 187516823 have a match to a single variant cluster
        - 30409265 have a match to single variant cluster group and multiple variant clusters

        - 224078 have match to at least one variant cluster and has match to a decoy sequence (not used for inference)
        - 20713 have match to at least one variant cluster and has a maximum haploid multiplicity higher than 127 (not used for inference)
        - 11835346 have matches to multiple variant cluster groups within or across inference units (not used for inference)

        - 1201141 have no match to a variant cluster (includes parameter kmers)

[30/12/2019 17:04:33] Maximum resident set size: 28.9024 Gb

[30/12/2019 17:04:34] Estimating genomic haploid kmer count distribution(s) from parameter kmers ...

[30/12/2019 17:04:34] Estimated negative binomial (mean = 12.5918, var = 25.1094) for sample A-ADC-AD008276-BL-NCR-13AD63452 using 882855 parameter
 kmers (multiplicity = 2)

[30/12/2019 17:04:34] Wrote genomic parameters to bayestyper_unit_7/bayestyper_genomic_parameters.txt

[30/12/2019 17:04:34] Maximum resident set size: 28.9024 Gb

[30/12/2019 17:04:34] Estimating noise model parameters using 20 independent gibbs sampling chains each with 350 iterations (100 burn-in) ...
[30/12/2019 17:10:25] Calculated final noise model parameters by averaging 5000 parameter estimates (250 per gibbs sampling chain)

[30/12/2019 17:10:25] Wrote noise parameters to bayestyper_unit_7/bayestyper_noise_parameters.txt

[30/12/2019 17:10:25] Maximum resident set size: 29.6418 Gb

[30/12/2019 17:10:25] Estimating genotypes using 20 independent gibbs sampling chains each with 350 iterations (100 burn-in) ...

[30/12/2019 17:15:35] Genotyped 100000 variants
[30/12/2019 17:18:47] Genotyped 200000 variants
[30/12/2019 17:21:38] Genotyped 300000 variants
[30/12/2019 17:24:50] Genotyped 400000 variants
[30/12/2019 17:28:12] Genotyped 500000 variants
[30/12/2019 17:31:21] Genotyped 600000 variants
[30/12/2019 17:34:11] Genotyped 700000 variants
[30/12/2019 17:37:02] Genotyped 800000 variants
[30/12/2019 17:39:20] Genotyped 900000 variants
[30/12/2019 17:41:05] Genotyped 1000000 variants
[30/12/2019 17:42:24] Genotyped 1100000 variants
[30/12/2019 17:43:41] Genotyped 1200000 variants
[30/12/2019 17:44:49] Genotyped 1300000 variants
[30/12/2019 17:45:42] Genotyped 1400000 variants
[30/12/2019 17:46:20] Genotyped 1500000 variants
[30/12/2019 17:46:52] Genotyped 1600000 variants
[30/12/2019 17:47:22] Genotyped 1700000 variants
[30/12/2019 17:47:52] Genotyped 1800000 variants
[30/12/2019 17:48:21] Genotyped 1900000 variants
[30/12/2019 17:48:50] Genotyped 2000000 variants
[30/12/2019 17:49:20] Genotyped 2100000 variants
[30/12/2019 17:49:49] Genotyped 2200000 variants
[30/12/2019 17:50:20] Genotyped 2300000 variants
[30/12/2019 17:50:51] Genotyped 2400000 variants
[30/12/2019 17:51:22] Genotyped 2500000 variants
[30/12/2019 17:51:53] Genotyped 2600000 variants
[30/12/2019 17:52:24] Genotyped 2700000 variants
[30/12/2019 17:52:58] Genotyped 2800000 variants
[30/12/2019 17:53:30] Genotyped 2900000 variants
[30/12/2019 17:54:02] Genotyped 3000000 variants
[30/12/2019 17:54:36] Genotyped 3100000 variants
[30/12/2019 17:55:12] Genotyped 3200000 variants
[30/12/2019 17:55:46] Genotyped 3300000 variants
[30/12/2019 17:56:21] Genotyped 3400000 variants
[30/12/2019 17:56:56] Genotyped 3500000 variants
[30/12/2019 17:57:33] Genotyped 3600000 variants
[30/12/2019 17:58:14] Genotyped 3700000 variants
[30/12/2019 17:58:53] Genotyped 3800000 variants
[30/12/2019 17:59:34] Genotyped 3900000 variants
[30/12/2019 18:00:14] Genotyped 4000000 variants
[30/12/2019 18:00:53] Genotyped 4100000 variants
[30/12/2019 18:01:35] Genotyped 4200000 variants
[30/12/2019 18:02:29] Genotyped 4300000 variants
[30/12/2019 18:03:20] Genotyped 4400000 variants
[30/12/2019 18:04:11] Genotyped 4500000 variants
[30/12/2019 18:05:02] Genotyped 4600000 variants
[30/12/2019 18:05:54] Genotyped 4700000 variants
[30/12/2019 18:06:46] Genotyped 4800000 variants
[30/12/2019 18:07:41] Genotyped 4900000 variants
[30/12/2019 18:09:09] Genotyped 5000000 variants
[30/12/2019 18:10:37] Genotyped 5100000 variants
[30/12/2019 18:12:05] Genotyped 5200000 variants
[30/12/2019 18:13:33] Genotyped 5300000 variants
[30/12/2019 18:15:00] Genotyped 5400000 variants
[30/12/2019 18:16:30] Genotyped 5500000 variants

[30/12/2019 18:16:43] Sorting genotyped variants ...
[30/12/2019 18:17:44] Wrote genotyped variants to bayestyper_unit_7/bayestyper.vcf.gz

[30/12/2019 18:17:45] Out of 5509182 variants:
jonassibbesen commented 4 years ago

Thank you for posting this. The only reason I can think of that could result in this error is if somehow the temporary file storing the genotypes that is written to disk during genotyping was somehow corrupted or changed. Please let me know if you run into the same problem again.

jjfarrell commented 4 years ago

I will get back to you if it happens again. I will reopen the issue if it happens again.