DOI-USGS / lake-temperature-lstm-static

Predict lake temperatures at depth using static lake attributes
Other
0 stars 3 forks source link

Stop unzip_archive from matching files #22

Closed AndyMcAliley closed 2 years ago

AndyMcAliley commented 2 years ago

Use regular expressions to ensure that unzip_archive's output does not match files in subdirectories of 1_fetch/out/ or csv files in 1_fetch/out. Closes #13.

unzip_archive is a Snakemake checkpoint to unzip files from a zipped archive. It is a checkpoint because otherwise Snakemake won't track unzipped files from an archive and will delete or ignore them. Its output is a directory because we don't know how many unzipped files there will be, but we know which directory they'll be in after they are unzipped.

The output is the name of the directory that files are extracted to. Regular expressions ensure that files in subdirectories of 1_fetch/out/{file_category} don't get matched, and csvs in 1_fetch/out/ don't get matched.

Syntax explanation

The major change happens in one line of code:

folder = directory("1_fetch/out/{file_category,[^/]+}/{archive_name,[^/]+$(?<!\.csv)}")

First off, ignore the regular expression stuff and focusing on the output string with wildcards only.

"1_fetch/out/{file_category}/{archive_name}"

Put all this together and you get:

folder = directory("1_fetch/out/{file_category,[^/]+}/{archive_name,[^/]+$(?<!\.csv)}")