Closed karatugo closed 1 month ago
Tested with the following.
https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90269001-GCST90270000/GCST90269497/ https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90269001-GCST90270000/GCST90269497/harmonised/ http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST005001-GCST006000/GCST005529/ http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST005001-GCST006000/GCST005529/harmonised/ http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90308001-GCST90309000/GCST90308682/
Description
Metadata YAML generation is designed to retrieve MD5 checksums for files based on the provided accession ID. The function currently assumes filenames follow specific patterns:
accession_id.tsv
oraccession_id.tsv.gz
, andaccession_id.h.tsv
oraccession_id.h.tsv.gz
if harmonised. However, there are cases where filenames include additional patterns, such asGCST90308682_buildGRCh37.tsv
, which are not currently handled by the function. This leads to potential mismatches and an inability to retrieve the correct MD5 checksum.Suggested Enhancement
Modify the function to handle additional filename patterns. One possible approach is to include regular expression matching to account for various patterns while maintaining the current functionality.
See http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90308001-GCST90309000/GCST90308682/