EBISPOT / gwas-sumstats-harmoniser

GWAS Summary Statistics Data Harmonisation
19 stars 13 forks source link

Write the file stager/filter for the harmonisation pipeline #36

Closed jdhayhurst closed 2 years ago

jdhayhurst commented 2 years ago

The harmonisation pipeline will process anything in a "ready to harmonised" directory. We need a script to deposit files into this directory assuming they are eligible.

Reduce “glue” by adding this to the existing Sumstats (ftp) release script, which already needs to identify and release newly submitted files.

By adding into the existing “nighlty sumstast ftp sync” script, we increase the complexity of what that script does but we remove another gluey script. We can remove another glue script by using the publishing directives of nextflow.

more info: https://docs.google.com/document/d/1b1g9PIUH6B688_aqBIaZulOtgCEnEXNcirJ1vUUmJq8/edit#

jdhayhurst commented 2 years ago

rather than a black/white list, we can actually use the metadata for the files to store the validation status - if valid, harmonised; else don't.

jdhayhurst commented 2 years ago

Note: if files are "flipped" into a published state, we need to a) update this in the YAML + b) touch the sumstats file, so that they will be picked up by the above.

jdhayhurst commented 2 years ago

To reduce the amount of "glue" we can add this into the ftp sync process - https://github.com/EBISPOT/gwas-utils/ftpSummaryStatsScript/ftp_sync.py