dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
137 stars 36 forks source link

Exception deserializing BCF bucket #288

Open lvclark opened 1 year ago

lvclark commented 1 year ago

I am running GLnexus using Singularity from the official Docker image like so:

singularity exec ~/singularity/glnexus_v1.4.1.sif \
glnexus_cli \
    -t 16 \
    --config gatk \
    --dir ./GLnexus.DB.chr13test \
    combined_name_short_c*s_ALL_chr13test_gvcf.vcf.gz > cohort_Lindsay_chr13test_glnexus.bcf

But getting this error relating to capnp:

[12693] [2023-04-03 08:05:01.834] [GLnexus] [info] glnexus_cli release v1.4.1-0-g68e25e5 Aug 13 2021
[12693] [2023-04-03 08:05:01.834] [GLnexus] [info] detected jemalloc 5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756
[12693] [2023-04-03 08:05:01.834] [GLnexus] [info] Loading config preset gatk
[12693] [2023-04-03 08:05:01.840] [GLnexus] [info] config:
unifier_config:
  drop_filtered: false
  min_allele_copy_number: 1
  min_AQ1: 70
  min_AQ2: 40
  min_GQ: 40
  max_alleles_per_site: 32
  monoallelic_sites_for_lost_alleles: true
  preference: common
genotyper_config:
  revise_genotypes: true
  min_assumed_allele_frequency: 9.99999975e-05
  snv_prior_calibration: 1
  indel_prior_calibration: 1
  required_dp: 1
  allow_partial_data: false
  allele_dp_format: AD
  ref_dp_format: MIN_DP
  output_residuals: false
  more_PL: false
  squeeze: false
  trim_uncalled_alleles: false
  top_two_half_calls: false
  output_format: BCF
  liftover_fields:
    - {orig_names: [MIN_DP, DP], name: DP, description: "##FORMAT=<ID=DP,Number=1,Type=Integer,Description=\"Approximate read depth (reads with MQ=255 or with bad mates are filtered)\">", type: int, number: basic, default_type: missing, count: 1, combi_method: min, ignore_non_variants: true}
    - {orig_names: [AD], name: AD, description: "##FORMAT=<ID=AD,Number=R,Type=Integer,Description=\"Allelic depths for the ref and alt alleles in the order listed\">", type: int, number: alleles, default_type: zero, count: 0, combi_method: min, ignore_non_variants: false}
    - {orig_names: [SB], name: SB, description: "##FORMAT=<ID=SB,Number=4,Type=Integer,Description=\"Per-sample component statistics which comprise the Fishers Exact Test to detect strand bias.\">", type: int, number: basic, default_type: missing, count: 4, combi_method: missing, ignore_non_variants: false}
    - {orig_names: [GQ], name: GQ, description: "##FORMAT=<ID=GQ,Number=1,Type=Integer,Description=\"Genotype Quality\">", type: int, number: basic, default_type: missing, count: 1, combi_method: min, ignore_non_variants: true}
    - {orig_names: [PL], name: PL, description: "##FORMAT=<ID=PL,Number=G,Type=Integer,Description=\"Phred-scaled genotype Likelihoods\">", type: int, number: genotype, default_type: missing, count: 0, combi_method: missing, ignore_non_variants: true}
[12693] [2023-04-03 08:05:01.840] [GLnexus] [info] config CRC32C = 1926883223
[12693] [2023-04-03 08:05:01.841] [GLnexus] [info] init database, exemplar_vcf=combined_name_short_cases_ALL_chr13test_gvcf.vcf.gz
[12693] [2023-04-03 08:05:02.137] [GLnexus] [info] Initialized GLnexus database in ./GLnexus.DB.chr13test
[12693] [2023-04-03 08:05:02.137] [GLnexus] [info] bucket size: 30000
[12693] [2023-04-03 08:05:02.137] [GLnexus] [info] contigs: chrM chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chr1_gl000191_random chr1_gl000192_random chr4_ctg9_hap1 chr4_gl000193_random chr4_gl000194_random chr6_apd_hap1 chr6_cox_hap2 chr6_dbb_hap3 chr6_mann_hap4 chr6_mcf_hap5 chr6_qbl_hap6 chr6_ssto_hap7 chr7_gl000195_random chr8_gl000196_random chr8_gl000197_random chr9_gl000198_random chr9_gl000199_random chr9_gl000200_random chr9_gl000201_random chr11_gl000202_random chr17_ctg5_hap1 chr17_gl000203_random chr17_gl000204_random chr17_gl000205_random chr17_gl000206_random chr18_gl000207_random chr19_gl000208_random chr19_gl000209_random chr21_gl000210_random chrUn_gl000211 chrUn_gl000212 chrUn_gl000213 chrUn_gl000214 chrUn_gl000215 chrUn_gl000216 chrUn_gl000217 chrUn_gl000218 chrUn_gl000219 chrUn_gl000220 chrUn_gl000221 chrUn_gl000222 chrUn_gl000223 chrUn_gl000224 chrUn_gl000225 chrUn_gl000226 chrUn_gl000227 chrUn_gl000228 chrUn_gl000229 chrUn_gl000230 chrUn_gl000231 chrUn_gl000232 chrUn_gl000233 chrUn_gl000234 chrUn_gl000235 chrUn_gl000236 chrUn_gl000237 chrUn_gl000238 chrUn_gl000239 chrUn_gl000240 chrUn_gl000241 chrUn_gl000242 chrUn_gl000243 chrUn_gl000244 chrUn_gl000245 chrUn_gl000246 chrUn_gl000247 chrUn_gl000248 chrUn_gl000249
[12693] [2023-04-03 08:05:02.160] [GLnexus] [info] db_get_contigs ./GLnexus.DB.chr13test
[12693] [2023-04-03 08:05:02.215] [GLnexus] [info] Beginning bulk load with no range filter.
[12693] [2023-04-03 08:27:55.854] [GLnexus] [info] Loaded 2 datasets with 721 samples; 115906987368 bytes in 21423078 BCF records (335 duplicate) in 1402 buckets. Bucket max 297427480 bytes, 25887 records. 0 BCF records skipped due to caller-specific exceptions
[12693] [2023-04-03 08:27:55.856] [GLnexus] [info] Created sample set *@2
[12693] [2023-04-03 08:27:55.856] [GLnexus] [info] Flushing database...
[12693] [2023-04-03 08:29:54.332] [GLnexus] [info] Bulk load complete!
[12693] [2023-04-03 08:29:54.358] [GLnexus] [warning] Processing full length of 93 contigs, as no --bed was provided. Providing a BED file with regions of interest, if applicable, can speed this up.
[12693] [2023-04-03 08:29:54.399] [GLnexus] [info] found sample set *@2
[12693] [2023-04-03 08:29:54.399] [GLnexus] [info] discovering alleles in 93 range(s) on 14 threads
[12693] [2023-04-03 08:29:55.542] [GLnexus] [error] Failed to discover alleles: IOError: exception deserializing BCF bucket (capnp/arena.c++:127: failed: Exceeded message traversal limit.  See capnp::ReaderOptions.
stack: 5648996158d8 5648990ba8f7 5648990c61e5 5648990c6349 56489908ba38 564899083508 564899009248 2abc52d6947e 56489907fbaa 56489907fd02 56489901316c 56489968b47e 2abc52d60608 2abc52e9c292)

I had also tried to provide a BED as shown in the tutorial, but it didn't seem to be recognized so I subsetted my gVCFs with bcftools view first instead.