OSeMOSYS / osemosys_global

A global power system model generator for OSeMOSYS
https://osemosys-global.readthedocs.io/
GNU Affero General Public License v3.0
27 stars 17 forks source link

[BUG] Deleting PLEXOS input datafiles breaks snakemake #119

Closed trevorb1 closed 2 months ago

trevorb1 commented 2 years ago

Conda environment check

Current Behavior

If you delete an input file, such as PLEXOS_World_MESSAGEix_GLOBIOM_Softlink.xlsx, and run the snakemake workflow an error will be thrown. The scripts have try except statements to check for the file locally first and only download if necessary. This means the scripts run fine independently, but if the file needs to be downloaded snakemake won't see that and break.

Expected Behavior

If you delete the datafile in resources/data and run the snakemake workflow, the appropriate files will be downloaded and the workflow will run without issue.

Steps To Reproduce

  1. Delete the file resources/data/PLEXOS_World_MESSAGEix_GLOBIOM_Softlink.xlsx
  2. Run the workflow with snakemake -c

Operating System

Linux

Log output

(osemosys-global) trevorb1@DESKTOP-HCG4NHT:~/repositories/osemosys_global$ snakemake -c
Building DAG of jobs...
MissingInputException in line 175 of /home/trevorb1/repositories/osemosys_global/workflow/rules/preprocess.smk:
Missing input files for rule max_capacity:
resources/data/PLEXOS_World_MESSAGEix_GLOBIOM_Softlink.xlsx

Anything else?

Maybe we should create a script that does all the file downloading right at the start of the workflow?

trevorb1 commented 2 months ago

To fix this, we can just break the retrieval of data into separate retrieve rules and scripts

maartenbrinkerink commented 2 months ago

@trevorb1 Let me have a go at this, would be good for me to get a better understanding of how snakemake deals with these things. To be clear, what you suggest is to create a separate script that deals with retrieving all external files? And then create a rule that runs this script before the file_check rule occurs? If so, should this new rule be part of preprocess.smk or should it have a separate .smk file. I.e. if this new rule would be placed above the file_check rule in preprocess.smk it is guaranteed to be run first?

maartenbrinkerink commented 2 months ago

Also, if I understand snakemake correctly, if we would create a script for retrieving the external files there's no more need for try except statements? As in if the files already exist snakemake will just skip it?

trevorb1 commented 2 months ago

@maartenbrinkerink

To be clear, what you suggest is to create a separate script that deals with retrieving all external files?

Yes, exactly!

And then create a rule that runs this script before the file_check rule occurs

I think it would run before any of the processing scripts run that require PLEXOS (or whatever other) data

if this new rule would be placed above the file_check rule in preprocess.smk it is guaranteed to be run first

I think in snakemake, we can just specify the file to be downloaded, then it will link up the rules together correctly! (I dont think there is a requirement for a rule to be above one another, but idk off the top of my head for sure).

Also, if I understand snakemake correctly, if we would create a script for retrieving the external files there's no more need for try except statements?

Yes, I totally agree!