PyPSA / pypsa-eur

PyPSA-Eur: A Sector-Coupled Open Optimisation Model of the European Energy System
https://pypsa-eur.readthedocs.io/
342 stars 242 forks source link

GEM steel plant file from globalenergymonitor is not downloaded properly #1260

Closed ulfmueller closed 2 months ago

ulfmueller commented 2 months ago

Checklist

Describe the Bug

The Global-Steel-Plant-Tracker-April-2024-Standard-Copy-V1.xlsx is not downloaded properly from globalenergymonitor (it is basically an empty file, with a notice that the access is restricted). Problem might be related to #1125 and #1233 .

Error Message

ERROR:root:Uncaught exception Traceback (most recent call last): File "/home/ulf/github/pypsa-eur/.snakemake/scripts/tmpmv6n0gxa.build_industrial_distribution_key.py", line 406, in gem = prepare_gem_database(regions) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ulf/github/pypsa-eur/.snakemake/scripts/tmpmv6n0gxa.build_industrial_distribution_key.py", line 144, in prepare_gem_database df = pd.read_excel( ^^^^^^^^^^^^^^ File "/home/ulf/Downloads/yes/envs/pypsa-eur/lib/python3.12/site-packages/pandas/io/excel/_base.py", line 495, in read_excel io = ExcelFile( ^^^^^^^^^^ File "/home/ulf/Downloads/yes/envs/pypsa-eur/lib/python3.12/site-packages/pandas/io/excel/_base.py", line 1554, in init raise ValueError( ValueError: Excel file format cannot be determined, you must specify an engine manually.

koen-vg commented 2 months ago

I also just ran into this; frustratingly you do get an excel file (so the retrieve rule succeeds), but the excel file contains the following content:

image

It looks like globalenergymonitor.org isn't so happy about lots of people downloading this file; I would either contact them about it, or, more likely, just mirror the file somewhere where the PyPSA project has control over it.

fneum commented 2 months ago

xref from #1265

The dataset links have cookie-based anti-bot protection against automated downloads, which went unnoticed when adding them to the workflow.

I have reached out to Global Energy Monitor to see if they would be willing to offer an official Zenodo repository (or similar).

The TUBcloud is used as an intermediary solution.

If we do not get the official Zenodo repository, the CC-BY 4.0 license permits redistribution on a Zenodo mirror provided attribution is given. This would be a long-term solution but requires updates from us to the latest GEM datasets once in a while.