fneum / core-tso-data

MIT License
7 stars 8 forks source link

Snakemake jobs not reproducible - JAO download data changed? #16

Open fleimgruber opened 1 year ago

fleimgruber commented 1 year ago

While trying to contribute to #14 I ran into the error below at the step snakemake -j 1 process_data. I was running this on GNU/Linux, also tried on a Windows machine - it looks like the JAO download has changed?

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job              count    min threads    max threads
-------------  -------  -------------  -------------
process_data         1              1              1
retrieve_data        1              1              1
total                2              1              1

Select jobs to execute...

[Sun Jan  1 19:55:05 2023]
rule retrieve_data:
    input: www.jao.eu/download/archive/paragraph/545/field_attachments
    output: data/jao/Core _Static Grid Model_template_AT.xlsx, data/jao/Core _Static Grid Model_template_BE_Q4-2021_all_EIC_0.xlsx, data/jao/Core _Static Grid Model_template_CZ.xlsx, data/jao/Core _Static Grid Model_template_D2.xlsx, data/jao/Core _Static Grid Model_template_D4_0.xlsx, data/jao/Core _Static Grid Model_template_D7.xlsx, data/jao/Core _Static Grid Model_template_D8.xlsx, data/jao/Core _Static Grid Model_template_FR.xlsx, data/jao/Core _Static Grid Model_template_HR.xlsx, data/jao/Core _Static Grid Model_template_HU.xlsx, data/jao/Core _Static Grid Model_template_LU.xlsx, data/jao/Core _Static Grid Model_template_NL.xlsx, data/jao/Core _Static Grid Model_template_PL.xlsx, data/jao/Core _Static Grid Model_template_RO.xlsx, data/jao/Core _Static Grid Model_template_SI.xlsx, data/jao/Core _Static Grid Model_template_SK.xlsx
    jobid: 1
    reason: Missing output files: data/jao/Core _Static Grid Model_template_AT.xlsx, data/jao/Core _Static Grid Model_template_HU.xlsx, data/jao/Core _Static Grid Model_template_FR.xlsx, data/jao/Core _Static Grid Model_template_D7.xlsx, data/jao/Core _Static Grid Model_template_RO.xlsx, data/jao/Core _Static Grid Model_template_HR.xlsx, data/jao/Core _Static Grid Model_template_SI.xlsx, data/jao/Core _Static Grid Model_template_SK.xlsx, data/jao/Core _Static Grid Model_template_CZ.xlsx, data/jao/Core _Static Grid Model_template_LU.xlsx, data/jao/Core _Static Grid Model_template_D4_0.xlsx, data/jao/Core _Static Grid Model_template_BE_Q4-2021_all_EIC_0.xlsx, data/jao/Core _Static Grid Model_template_D2.xlsx, data/jao/Core _Static Grid Model_template_NL.xlsx, data/jao/Core _Static Grid Model_template_D8.xlsx, data/jao/Core _Static Grid Model_template_PL.xlsx
    resources: tmpdir=/run/user/1000

Archive:  www.jao.eu/download/archive/paragraph/545/field_attachments
replace data/jao/20220905_Core _Static Grid Model_1.xlsx? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: data/jao/20220905_Core _Static Grid Model_1.xlsx  
  inflating: data/jao/20220905_Core Static Grid Model Handbook_final_1.docx  
  inflating: data/jao/outdated_Core Static Grid Model ? 1st Release.zip  
Waiting at most 5 seconds for missing files.
MissingOutputException in rule retrieve_data in file /home/fps/dev/core-tso-data/Snakefile, line 6:
Job 1  completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
data/jao/Core _Static Grid Model_template_AT.xlsx
data/jao/Core _Static Grid Model_template_BE_Q4-2021_all_EIC_0.xlsx
data/jao/Core _Static Grid Model_template_CZ.xlsx
data/jao/Core _Static Grid Model_template_D2.xlsx
data/jao/Core _Static Grid Model_template_D4_0.xlsx
data/jao/Core _Static Grid Model_template_D7.xlsx
data/jao/Core _Static Grid Model_template_D8.xlsx
data/jao/Core _Static Grid Model_template_FR.xlsx
data/jao/Core _Static Grid Model_template_HR.xlsx
data/jao/Core _Static Grid Model_template_HU.xlsx
data/jao/Core _Static Grid Model_template_LU.xlsx
data/jao/Core _Static Grid Model_template_NL.xlsx
data/jao/Core _Static Grid Model_template_PL.xlsx
data/jao/Core _Static Grid Model_template_RO.xlsx
data/jao/Core _Static Grid Model_template_SI.xlsx
data/jao/Core _Static Grid Model_template_SK.xlsx
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-01-01T195504.022001.snakemake.log
fneum commented 1 year ago

Can confirm that the data provided under the link changed since the code was written.

I quickly uploaded the download I had, which you can use to get the code running:

https://tubcloud.tu-berlin.de/s/w245Q5sBt8CxyYK/download/jao.tar.xz

Extract and place into data. Obviously, the code should be updated to extract data provided currently under link.

fleimgruber commented 1 year ago

Thanks for providing the data. Additional changes were necessary to get it to work on that data, please see https://github.com/fleimgruber/core-tso-data/tree/upstream_data_workaround. Also, I did not find a way to create outputs/locator-results.csv, so I assume it is not automated yet? If you could give a short description of what would be needed, I would be happy to contribute that.