PGScatalog / pgscatalog_utils

(superseded by pygscatalog) Utilities for working with PGS Catalog API and scoring files
Apache License 2.0
4 stars 3 forks source link

checksum computation path fix #53

Closed openpaul closed 11 months ago

openpaul commented 1 year ago

hello,

not sure if you want pull requests. I found a bug that currently the download_scorefiles cant download into a folder as the checksums are computed for the filename and not the path.

As such this command fails:

mkdir anyfolder
download_scorefiles -i PGS000922 -o anyfolder/ -b GRCh37

With the output:

pgscatalog_utils.download.ScoringFileChecksum: 2023-08-25 15:50:13 WARNING  File PGS000922_hmPOS_GRCh37.txt.gz not found!
pgscatalog_utils.download.ScoringFileDownloader: 2023-08-25 15:50:13 WARNING  Scoring file PGS000922_hmPOS_GRCh37.txt.gz fails validation
pgscatalog_utils.download.ScoringFileDownloader: 2023-08-25 15:50:13 WARNING  Remote checksum: 1c59a24ea5ef65a10db2531ba106d5be
pgscatalog_utils.download.ScoringFileDownloader: 2023-08-25 15:50:13 WARNING  Local checksum: None
pgscatalog_utils.download.download_file: 2023-08-25 15:50:13 WARNING  /home/paul/software/pgscatalog_utils/anyfolder/PGS000922_hmPOS_GRCh37.txt.gz exists and overwrite is false, skipping download
pgscatalog_utils.download.ScoringFileChecksum: 2023-08-25 15:50:13 WARNING  File PGS000922_hmPOS_GRCh37.txt.gz not found!
pgscatalog_utils.download.download_file: 2023-08-25 15:50:13 WARNING  /home/paul/software/pgscatalog_utils/anyfolder/PGS000922_hmPOS_GRCh37.txt.gz.md5 exists and overwrite is false, skipping download
pgscatalog_utils.download.download_file: 2023-08-25 15:50:13 WARNING  /home/paul/software/pgscatalog_utils/anyfolder/PGS000922_hmPOS_GRCh37.txt.gz exists and overwrite is false, skipping download
pgscatalog_utils.download.ScoringFileChecksum: 2023-08-25 15:50:13 WARNING  File PGS000922_hmPOS_GRCh37.txt.gz not found!
pgscatalog_utils.download.download_file: 2023-08-25 15:50:13 WARNING  /home/paul/software/pgscatalog_utils/anyfolder/PGS000922_hmPOS_GRCh37.txt.gz.md5 exists and overwrite is false, skipping download
pgscatalog_utils.download.download_file: 2023-08-25 15:50:13 WARNING  /home/paul/software/pgscatalog_utils/anyfolder/PGS000922_hmPOS_GRCh37.txt.gz exists and overwrite is false, skipping download
pgscatalog_utils.download.ScoringFileChecksum: 2023-08-25 15:50:13 WARNING  File PGS000922_hmPOS_GRCh37.txt.gz not found!
pgscatalog_utils.download.download_file: 2023-08-25 15:50:13 WARNING  /home/paul/software/pgscatalog_utils/anyfolder/PGS000922_hmPOS_GRCh37.txt.gz.md5 exists and overwrite is false, skipping download

This fixes it hopefully as expected.

after the fix the output I see is as expected:

$ download_scorefiles -i PGS000922 -o anyfolder/ -b GRCh37
$ tree anyfolder
anyfolder
├── PGS000922_hmPOS_GRCh37.txt.gz
└── PGS000922_hmPOS_GRCh37.txt.gz.md5

1 directory, 2 files
smlmbrt commented 1 year ago

Hi @openpaul, we definitely do appreciate PRs (we usually PR into a dev branch before releasing to main)! @nebfield will take a look soon.

nebfield commented 11 months ago

Thanks for the fix 🎉