PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
114 stars 21 forks source link

unexpected .bed file size #87

Closed olatzu closed 1 year ago

olatzu commented 1 year ago

Dear creators,

Thank you very much for your amazing tool!

I am having a bit of a trouble in the last bit. This is the error:

PLINK v2.00a3.3 64-bit (3 Jun 2022) Options in effect: --bfile vzs plink_genome_test1_ALL --memory 8192 --out plink_genome_test1_ALL_additive_0 --score plink-genome-test1_ALL_additive_0.scorefile.gz zs header-read cols=+scoresums,+denom,-fid no-mean-imputation --seed 31 --threads 2

Hostname: ce5ff77d6f31 Working directory: /Users/olatzmompeo/Desktop/NETFLOW/work/44/b47dd47c1c6fee83c29377d8aecff8 Start time: Tue Feb 21 15:36:20 2023

15986 MiB RAM detected; reserving 8192 MiB for main workspace. Using up to 2 compute threads. 1 sample (1 female, 0 males; 1 founder) loaded from plink_genome_test1_ALL.fam. 963041 variants loaded from plink_genome_test1_ALL.bim.zst. Error: Unexpected PLINK 1 .bed file size (expected 963044 bytes).

I have checked and the --input bed file and has 963044 bytes, and the plink_genome_test1_ALL.bed has 65B

Is there any command that can help with this? My original one is:

./nextflow run pgscatalog/pgsc_calc \ -profile docker \ --min_overlap 0.2 --platform arm64 --input PGS_input_test.csv --target_build GRCh37 \ --pgs_id PGS001927

Thanks a lot in advance for your time

nebfield commented 1 year ago

This is a strange error, because the workflow doesn't modify the bed file, except for making a copy during the PLINK2_RELABELBIM process.

You could try deleting the work directory and re-running the workflow, in case there was a temporary problem:

$ rm -r /Users/olatzmompeo/Desktop/NETFLOW/work/
$ nextflow run pgscatalog/pgsc_calc -profile docker --min_overlap 0.2 --platform arm64 --input PGS_input_test.csv --target_build GRCh37 --pgs_id PGS001927

Can I double check:

olatzu commented 1 year ago

Dear Nebfield,

It work perfectly! I just had to remove the duplicates in plink transforming the 23andMe files!

Thank you very very much for your time

nebfield commented 1 year ago

Great 🥳

I'd recommend looking at your --min_overlap parameter again, which is set quite low.

I can't give any specific analysis advice but if a scoring file (--pgs_id) requires adjusting --min_overlap to work, then I wouldn't be confident in the accuracy of the final calculated scores. I explained this a little bit more previously. You might find it's a good idea to impute your input target genomes.

I'm closing this issue because the reported problem is resolved, but please feel free to ask more questions if anything is unclear.

olatzu commented 1 year ago

Dear Nebfield,

Absolutely. the min_overlap is too low! Working on the imputation now.

Thanks a lot :)