Closed trevorb1 closed 2 months ago
To fix this, we can just break the retrieval of data into separate retrieve
rules and scripts
@trevorb1 Let me have a go at this, would be good for me to get a better understanding of how snakemake deals with these things. To be clear, what you suggest is to create a separate script that deals with retrieving all external files? And then create a rule that runs this script before the file_check rule occurs? If so, should this new rule be part of preprocess.smk or should it have a separate .smk file. I.e. if this new rule would be placed above the file_check rule in preprocess.smk it is guaranteed to be run first?
Also, if I understand snakemake correctly, if we would create a script for retrieving the external files there's no more need for try
except
statements? As in if the files already exist snakemake will just skip it?
@maartenbrinkerink
To be clear, what you suggest is to create a separate script that deals with retrieving all external files?
Yes, exactly!
And then create a rule that runs this script before the file_check rule occurs
I think it would run before any of the processing scripts run that require PLEXOS (or whatever other) data
if this new rule would be placed above the file_check rule in preprocess.smk it is guaranteed to be run first
I think in snakemake, we can just specify the file to be downloaded, then it will link up the rules together correctly! (I dont think there is a requirement for a rule to be above one another, but idk off the top of my head for sure).
Also, if I understand snakemake correctly, if we would create a script for retrieving the external files there's no more need for try except statements?
Yes, I totally agree!
Conda environment check
osemosys-global
conda environment.Current Behavior
If you delete an input file, such as
PLEXOS_World_MESSAGEix_GLOBIOM_Softlink.xlsx
, and run the snakemake workflow an error will be thrown. The scripts havetry
except
statements to check for the file locally first and only download if necessary. This means the scripts run fine independently, but if the file needs to be downloaded snakemake won't see that and break.Expected Behavior
If you delete the datafile in
resources/data
and run the snakemake workflow, the appropriate files will be downloaded and the workflow will run without issue.Steps To Reproduce
resources/data/PLEXOS_World_MESSAGEix_GLOBIOM_Softlink.xlsx
snakemake -c
Operating System
Linux
Log output
Anything else?
Maybe we should create a script that does all the file downloading right at the start of the workflow?