Adapt combine_scorefiles to output information from scoring file headers

smlmbrt commented 1 year ago

combine_scorefiles already reads headers as a dictionary. The information extracted from the PGS Catalog style-header should be combined into a log/json file indexed by the scoring file accession that includes:

[ ] Original genome build per scoring file
[ ] Original number of variants
[ ] Traits
[ ] PGS Name
[ ] Citation
[ ] Which columns were used, variant sources, harmonisation status

Changes will need to be made to pgsc_calc to use metadata from the json rather than an API call. This will have the added benefit of not requiring internet access for any step of the pipeline after download_scorefiles. Changes needed:

[ ] modules/local/score_report.nf
[ ] bin/report.Rmd

smlmbrt commented 1 year ago

@ens-lgil is this something you could take on?

ens-lgil commented 1 year ago

Yes, I can have a look at it

PGScatalog / pgscatalog_utils

Adapt combine_scorefiles to output information from scoring file headers #24