Open mglev1n opened 3 months ago
@mglev1n, thanks for your comments and suggestions! I think there's some easy things we can do to better expose the version the scores (release date in file header and adding that to the report), and we will discuss the feasibility of a changelog on our side.
We do provide md5 for the scores so that versions can be compared. It is possible to download all the scorefiles you want using our python package (https://pypi.org/project/pgscatalog-utils/, https://pygscatalog.readthedocs.io/en/latest/) and use those downloads as a stable input to the pipeline (or extract them from the first run of the pipeline for re-use).
Description of feature
It would be great if there was some Version Control, such that the state of the PGS Catalog could be re-constructed as of a given date. At minimum, publishing a running change log (eg. when change occured, what was changed, why the change was made, etc.) would be extremely useful. Apologies if this feature already exists - if it does, making it more prominent would be great. A longer-term goal might be to allow users of
pgsc_calc
to request scores based on a given version/release of the PGS Catalog.Motivation
My lab and collaborators have noticed that executing the exact same
pgsc_calc
command that pulls scores from the PGS Catalog has resulted in different output when run on different days. In troubleshooting, we noticed a few issues:In one case, the
#trait_efo
assigned to a scorefile changed from one day to the next.In two other cases we noticed that the sign of the
effect_weight
column was changed within the scorefiles.It's great that archived versions of each scorefile are maintained on PGS Catalog FTP site, which eventually allowed us to troubleshoot these issues. However, tracking down these individual scorefile changes is very time consuming, particularly as the number of scores and number of archived versions increases. This problem also raises the potential for broader transparency/reproducibility issues. Thanks for all the hard work making this resource possible!