chrchang / plink-ng

A comprehensive update to the PLINK association analysis toolset. Beta testing of the first new version (1.90), focused on speed and memory efficiency improvements, is finishing up. Development is now focused on building out support for multiallelic, phased, and dosage data in PLINK 2.0.
https://www.cog-genomics.org/plink/2.0/
413 stars 127 forks source link

Error: .pgen file read failure: File appears to be corrupted. #220

Closed Hoeze closed 2 years ago

Hoeze commented 2 years ago

Hi, I'm trying to convert the UK Biobank 200k Whole Exome Sequencing dataset to a single plink2 dataset:

#> plink2 --pmerge-list mergelist.txt --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k --threads 6 --memory 8000
PLINK v2.00a3.3LM AVX2 Intel (3 Jun 2022)      www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.log.
Options in effect:
  --memory 8000
  --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k
  --pmerge-list mergelist.txt
  --threads 6

Start time: Sun Aug  7 23:52:45 2022
515572 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 200643 samples present.
--pmerge-list: Merged .psam written to
/s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 0/34977 variants complete.
Error: .pgen file read failure: File appears to be corrupted.

The error appears in a single pgen part that was generated like this:

bcftools view /s/raw/ukbiobank/WES_200K/ukb23156_c1_b45_v1.vcf.gz | \
  bcftools annotate --rename-chrs /s/project/uk_biobank/cache/chromAlias_bcftools.wsv | \
  bcftools norm --force -cs -m - -f /s/genomes/Gencode/Gencode_human/release_34/GRCh38.primary_assembly.genome.fa | \
  bcftools annotate --set-id '%CHROM:%POS:%REF>%FIRST_ALT' --threads 2 -O z -o \
    /s/project/uk_biobank/processed/WES_200K/normalized_vcf/ukb23156_c1_b45_v1.vcf.gz

plink2 --memory 8000 --out /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1 --threads 4 --vcf /s/project/uk_biobank/processed/WES_200K/normalized_vcf/ukb23156_c1_b45_v1.vcf.gz --vcf-half-call haploid --vcf-require-gt

This reproducible fails with that issue. What could be the issue there?

chrchang commented 2 years ago

Hi, this looks like a plink2 bug, perhaps in --pmerge-list. If necessary, I will send you a sequence of debug builds to help us get to the bottom of this, though if you can post a group of --pmerge-list input filesets that exhibits the same error (could be a lot smaller than 200643 x 34977), I'll try to reproduce and fix the bug directly from that.

What's the output of "plink2 --pfile [filename prefix] --pgen-info" on each of the --pmerge-list input filesets?

Hoeze commented 2 years ago

@chrchang Thanks for the quick answer. Unfortunately, I do not know the cause because the first 44 parts could be merged without any issues. Only the 45th part seems to make problems.

Also, I cannot share the files with you (privacy-related data...). If you do have access to UK Biobank data yourself, you can easily reproduce the issue. Otherwise, I'm happy to run any debug builds for you :)

The requested pgen-info:

```bash #> plink2 --pfile /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1 --pgen-info PLINK v2.00a3.3LM AVX2 Intel (3 Jun 2022) www.cog-genomics.org/plink/2.0/ (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink2.log. Options in effect: --pfile /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1 --pgen-info Start time: Mon Aug 8 11:10:45 2022 515572 MiB RAM detected; reserving 257786 MiB for main workspace. Using up to 128 threads (change this with --threads). 200643 samples (0 females, 0 males, 200643 ambiguous; 200643 founders) loaded from /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1.psam. 18266 variants loaded from /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1.pvar. --pgen-info on /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1.pgen: Variants: 18266 Samples: 200643 REF alleles are all known Maximum allele count for a single variant: 2 No hardcalls are explicitly phased No dosages present End time: Mon Aug 8 11:10:45 2022 #> plink2 --pfile /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1 --pgen-info PLINK v2.00a3.3LM AVX2 Intel (3 Jun 2022) www.cog-genomics.org/plink/2.0/ (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink2.log. Options in effect: --pfile /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1 --pgen-info Start time: Mon Aug 8 11:11:14 2022 515572 MiB RAM detected; reserving 257786 MiB for main workspace. Using up to 128 threads (change this with --threads). 200643 samples (0 females, 0 males, 200643 ambiguous; 200643 founders) loaded from /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1.psam. 16714 variants loaded from /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1.pvar. --pgen-info on /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1.pgen: Variants: 16714 Samples: 200643 REF alleles are all known Maximum allele count for a single variant: 2 No hardcalls are explicitly phased No dosages present ```
chrchang commented 2 years ago

Okay. I brain-farted and meant --validate rather than --pgen-info (though the --pgen-info output isn't totally useless to me); sorry about that.

If both files validate properly, I'll post a debug build for you.

Hoeze commented 2 years ago

@chrchang Here is the validate output:

```bash #> plink2 --pfile /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1 --validate PLINK v2.00a3.3LM AVX2 Intel (3 Jun 2022) www.cog-genomics.org/plink/2.0/ (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink2.log. Options in effect: --pfile /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1 --validate Start time: Mon Aug 8 13:26:54 2022 47780 MiB RAM detected; reserving 23890 MiB for main workspace. Using up to 6 compute threads. 200643 samples (0 females, 0 males, 200643 ambiguous; 200643 founders) loaded from /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1.psam. 18266 variants loaded from /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1.pvar. Validating /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b45_v1.pgen... done. End time: Mon Aug 8 13:26:54 2022 #> plink2 --pfile /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1 --validate PLINK v2.00a3.3LM AVX2 Intel (3 Jun 2022) www.cog-genomics.org/plink/2.0/ (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink2.log. Options in effect: --pfile /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1 --validate Start time: Mon Aug 8 13:26:57 2022 47780 MiB RAM detected; reserving 23890 MiB for main workspace. Using up to 6 compute threads. 200643 samples (0 females, 0 males, 200643 ambiguous; 200643 founders) loaded from /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1.psam. 16714 variants loaded from /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1.pvar. Validating /s/project/uk_biobank/cache/WES_200K/pgen/ukb23156_c1_b46_v1.pgen... done. End time: Mon Aug 8 13:26:57 2022 ```
chrchang commented 2 years ago

First debug build is posted at https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20220808a.zip ; or you can build 3f30579 from source.

Try running the failing --pmerge-list command with this build, after adding the --debug flag.

Hoeze commented 2 years ago

@chrchang Here is the output:

PLINK v2.00a3.5LM AVX2 Intel (8 Aug 2022)      www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.log.
Options in effect:
  --debug
  --memory 8000
  --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k
  --pmerge-list mergelist.txt
  --threads 6

Start time: Tue Aug  9 10:43:39 2022
515572 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 200643 samples present.
--pmerge-list: Merged .psam written to
/s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 0/34977 variants complete.DEBUG (MergePgenVariantNoTmpLocked: simple_first_allele_remap branch: failed to read variant 6966
merge_rec_ct=1  write_allele_ct=2  allele_remap_stride=2

Error: .pgen file read failure: File appears to be corrupted.
DEBUG (ConcatPvariantPos): cur_bp=97920982  variant_ct=1  rec_idx_start=0
DEBUG (PmergeConcat): cur_bp > prev_bp branch

End time: Tue Aug  9 10:43:40 2022
chrchang commented 2 years ago

Thanks. Next debug build is at https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20220809a.zip (or c399fda ). This should be run with the same command.

Hoeze commented 2 years ago

@chrchang debug nr. 2:

PLINK v2.00a3.5LM AVX2 Intel (9 Aug 2022)      www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.log.
Options in effect:
  --debug
  --memory 8000
  --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k
  --pmerge-list mergelist.txt
  --threads 6

Start time: Tue Aug  9 17:26:00 2022
515572 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 200643 samples present.
--pmerge-list: Merged .psam written to
/s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 0/34977 variants complete.DEBUG (PmergeConcat): fileset_idx=0  mr.sample_ct=200643  sample_idx_increasing=0
DEBUG (PmergeConcat): pgfi_alloc addr=7fab2ef7c380  next=7fab2efa4600
DEBUG (PgfiInitPhase2): vrtypes_iter addr=7fab2ef80ada
DEBUG (PmergeConcat): pgr_alloc addr=7fab2efa4600  next=7fab2effeb80
DEBUG (PgrInit): pgr_alloc_iter addr=7fab2effeb80
DEBUG (ReadGenovecSubsetUnsafe): vrtype=4
DEBUG (ReadGenovecSubsetUnsafe): Non-LD InitReadPtrs fail; fread_ptr=7fff70f4f2d7  fread_end=0
DEBUG (MergePgenVariantNoTmpLocked: simple_first_allele_remap branch: failed to read variant 6966
merge_rec_ct=1  write_allele_ct=2  allele_remap_stride=2

Error: .pgen file read failure: File appears to be corrupted.
DEBUG (ConcatPvariantPos): cur_bp=97920982  variant_ct=1  rec_idx_start=0
DEBUG (PmergeConcat): cur_bp > prev_bp branch

End time: Tue Aug  9 17:26:00 2022
chrchang commented 2 years ago

Ok. 3rd debug build: https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20220809b.zip (or 0ea4a0f ).

Hoeze commented 2 years ago

@chrchang Nr. 3:

PLINK v2.00a3.5LM AVX2 Intel (9 Aug 2022)      www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.log.
Options in effect:
  --debug
  --memory 8000
  --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k
  --pmerge-list mergelist.txt
  --threads 6

Start time: Tue Aug  9 18:40:12 2022
515572 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 200643 samples present.
--pmerge-list: Merged .psam written to
/s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 0/34977 variants complete.DEBUG (PmergeConcat): fileset_idx=0  mr.sample_ct=200643  sample_idx_increasing=0
DEBUG (PmergeConcat): pgfi_alloc addr=7fbabd879380  next=7fbabd8a1600
DEBUG (PgfiInitPhase2): vrtypes_iter addr=7fbabd87dada
var_fpos[6966]=5449872  var_fpos[6967]=5449959
DEBUG (PmergeConcat): pgr_alloc addr=7fbabd8a1600  next=7fbabd8fbb80
DEBUG (PgrInit): pgr_alloc_iter addr=7fbabd8fbb80
DEBUG (InitReadPtrs): fread failed, cur_vrec_width=4294966615, errno=2
var_fpos[6966]=5450640  var_fpos[6967]=5449959  address=7fbabd88b4b8
DEBUG (MergePgenVariantNoTmpLocked: simple_first_allele_remap branch: failed to read variant 6966
merge_rec_ct=1  write_allele_ct=2  allele_remap_stride=2

Error: .pgen file read failure: File appears to be corrupted.
DEBUG (ConcatPvariantPos): cur_bp=97920982  variant_ct=1  rec_idx_start=0
DEBUG (PmergeConcat): cur_bp > prev_bp branch

End time: Tue Aug  9 18:40:13 2022
chrchang commented 2 years ago

Debug build 4: https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20220809c.zip (or bff2145 )

Hoeze commented 2 years ago

Nr. 4:

PLINK v2.00a3.5LM AVX2 Intel (9 Aug 2022)      www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.log.
Options in effect:
  --debug
  --memory 8000
  --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k
  --pmerge-list mergelist.txt
  --threads 6

Start time: Tue Aug  9 19:16:15 2022
515572 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 200643 samples present.
--pmerge-list: Merged .psam written to
/s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 0/34977 variants complete.DEBUG (PmergeConcat): fileset_idx=0  sample_ct=200643  sample_idx_increasing=0
DEBUG (PmergeConcat): pgfi_alloc addr=7f454eea7380  next=7f454eecf600
DEBUG (PgfiInitPhase2): vrtypes_iter addr=7f454eeabada
var_fpos[6966]=5449872  var_fpos[6967]=5449959
DEBUG (PmergeConcat): pgr_alloc addr=7f454eecf600  next=7f454ef29b80
DEBUG (PgrInit): pgr_alloc_iter addr=7f454ef29b80
chrchang commented 2 years ago

Debug build 5: https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20220809d.zip (or 8e828ea )

Hoeze commented 2 years ago

Nr. 5:

PLINK v2.00a3.5LM AVX2 Intel (9 Aug 2022)      www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.log.
Options in effect:
  --debug
  --memory 8000
  --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k
  --pmerge-list mergelist.txt
  --threads 6

Start time: Tue Aug  9 19:27:19 2022
515572 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 200643 samples present.
--pmerge-list: Merged .psam written to
/s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 0/34977 variants complete.DEBUG (PmergeConcat): fileset_idx=0  sample_ct=200643  sample_idx_increasing=0
DEBUG (PmergeConcat): pgfi_alloc addr=7f430da78380  next=7f430daa0600
DEBUG (PgfiInitPhase2): vrtypes_iter addr=7f430da7cada
var_fpos[6966]=5449872  var_fpos[6967]=5449959
DEBUG (PmergeConcat): pgr_alloc addr=7f430daa0600  next=7f430dafab80
DEBUG (PgrInit): pgr_alloc_iter addr=7f430dafab80
ConcatPvariantPos cur_bp=97720149 rec_idx_start=5 after MergePgenVariantNoTmpLocked
ConcatPvariantPos cur_bp=97720149 rec_idx_start=7 before MergePvariant
ConcatPvariantPos cur_bp=97720149 rec_idx_start=7 after MergePvariant
ConcatPvariantPos cur_bp=97720149 rec_idx_start=7 before MergePgenVariantNoTmpLocked
ConcatPvariantPos cur_bp=97720149 rec_idx_start=7 after MergePgenVariantNoTmpLocked
var_fpos corrupt at read_variant_idx=6423 step 5
chrchang commented 2 years ago

Debug build 6: https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20220809e.zip (or bd8ebf4 )

Hoeze commented 2 years ago

Nr. 6:

PLINK v2.00a3.5LM AVX2 Intel (9 Aug 2022)      www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.log.
Options in effect:
  --debug
  --memory 8000
  --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k
  --pmerge-list mergelist.txt
  --threads 6

Start time: Tue Aug  9 19:59:56 2022
515572 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 200643 samples present.
--pmerge-list: Merged .psam written to
/s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 0/34977 variants complete.DEBUG (PmergeConcat): fileset_idx=0  sample_ct=200643  sample_idx_increasing=0
DEBUG (PmergeConcat): pgfi_alloc addr=7f7fa95c5380  next=7f7fa95ed600
DEBUG (PgfiInitPhase2): vrtypes_iter addr=7f7fa95c9ada
var_fpos[6966]=5449872  var_fpos[6967]=5449959
DEBUG (PmergeConcat): pgr_alloc addr=7f7fa95ed600  next=7f7fa9647b80
DEBUG (PgrInit): pgr_alloc_iter addr=7f7fa9647b80
DEBUG (InitReadPtrs): fp_vidx was 6415, seeking to offset 5000130
DEBUG (InitReadPtrs): fp_vidx was 6418, seeking to offset 5013961
DEBUG (InitReadPtrs): fp_vidx was 6420, seeking to offset 4974035
DEBUG (InitReadPtrs): fp_vidx was 6417, seeking to offset 5014630
DEBUG (InitReadPtrs): fp_vidx was 6422, seeking to offset 5000130
DEBUG (InitReadPtrs): fp_vidx was 6418, seeking to offset 5014314
DEBUG (ConcatPvariantPos): merge_rec_ct=2  allele_ct=2  read_max_allele_ct=2
[1]    1077709 segmentation fault (core dumped)  $TMP/plink2 --debug --pmerge-list mergelist.txt --out  --threads 6 --memory
chrchang commented 2 years ago

Debug build 7: https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20220809f.zip (or 347f5f1 )

Hoeze commented 2 years ago

Nr. 7:

PLINK v2.00a3.5LM AVX2 Intel (9 Aug 2022)      www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.log.
Options in effect:
  --debug
  --memory 8000
  --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k
  --pmerge-list mergelist.txt
  --threads 6

Start time: Tue Aug  9 21:09:01 2022
515572 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 200643 samples present.
--pmerge-list: Merged .psam written to
/s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 0/34977 variants complete.DEBUG (PmergeConcat): fileset_idx=0  sample_ct=200643  sample_idx_increasing=0
DEBUG (PmergeConcat): pgfi_alloc addr=7faece451380  next=7faece479600
DEBUG (PgfiInitPhase2): vrtypes_iter addr=7faece455ada
var_fpos[6966]=5449872  var_fpos[6967]=5449959
DEBUG (PmergeConcat): pgr_alloc addr=7faece479600  next=7faece4d3b80
DEBUG (PgrInit): pgr_alloc_iter addr=7faece4d3b80
DEBUG (InitReadPtrs): fp_vidx was 6415, seeking to offset 5000130
DEBUG (InitReadPtrs): fp_vidx was 6418, seeking to offset 5013961
DEBUG (InitReadPtrs): fp_vidx was 6420, seeking to offset 4974035
DEBUG (InitReadPtrs): fp_vidx was 6417, seeking to offset 5014630
DEBUG (InitReadPtrs): fp_vidx was 6422, seeking to offset 5000130
DEBUG (InitReadPtrs): fp_vidx was 6418, seeking to offset 5014314
DEBUG (ConcatPvariantPos): merge_rec_ct=2  allele_ct=2  read_max_allele_ct=2
step 26 rec_idx=1
chrchang commented 2 years ago

Debug build 8: https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20220809g.zip (or 963670f )

Hoeze commented 2 years ago

Nr. 8:

PLINK v2.00a3.5LM AVX2 Intel (9 Aug 2022)      www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.log.
Options in effect:
  --debug
  --memory 8000
  --out /s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k
  --pmerge-list mergelist.txt
  --threads 6

Start time: Tue Aug  9 21:27:01 2022
515572 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 200643 samples present.
--pmerge-list: Merged .psam written to
/s/project/uk_biobank/processed/WES_200K/ukbb_wes_200k.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 0/34977 variants complete.DEBUG (PmergeConcat): fileset_idx=0  sample_ct=200643  sample_idx_increasing=0
DEBUG (PmergeConcat): pgfi_alloc addr=7f2df04c8380  next=7f2df04f0600
DEBUG (PgfiInitPhase2): vrtypes_iter addr=7f2df04ccada
var_fpos[6966]=5449872  var_fpos[6967]=5449959
DEBUG (PmergeConcat): pgr_alloc addr=7f2df04f0600  next=7f2df054ab80
DEBUG (PgrInit): pgr_alloc_iter addr=7f2df054ab80
DEBUG (InitReadPtrs): fp_vidx was 6415, seeking to offset 5000130
DEBUG (InitReadPtrs): fp_vidx was 6418, seeking to offset 5013961
DEBUG (InitReadPtrs): fp_vidx was 6420, seeking to offset 4974035
DEBUG (InitReadPtrs): fp_vidx was 6417, seeking to offset 5014630
DEBUG (InitReadPtrs): fp_vidx was 6422, seeking to offset 5000130
DEBUG (InitReadPtrs): fp_vidx was 6418, seeking to offset 5014314
DEBUG (ConcatPvariantPos): merge_rec_ct=2  allele_ct=2  read_max_allele_ct=2
step 25e rec_idx=1 widx=6270 new_sample_idx=1048772 new_word_idx=32774
chrchang commented 2 years ago

Ok, check if this fixes the bug: https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20220809h.zip (or 927db87 )

Hoeze commented 2 years ago

Nice :tada: Thanks a lot @chrchang, this fixes the bug!