PyPSA / pypsa-eur

PyPSA-Eur: A Sector-Coupled Open Optimisation Model of the European Energy System
https://pypsa-eur.readthedocs.io/
309 stars 208 forks source link

Cannot download Excel file from globalenergymonitor for build_gas_input_locations.py #1125

Open fhg-isi opened 3 days ago

fhg-isi commented 3 days ago

Describe the Bug

When trying to run build_gas_input_location.py using

 snakemake = mock_snakemake(
            "build_gas_input_locations",
            simpl="",
            clusters="37",
        )

I get the error message below.

I am able to manually download the excel file when entering the url into a browser:

http://globalenergymonitor.org/wp-content/uploads/2023/07/Europe-Gas-Tracker-2023-03-v3.xlsx

Maybe the webpage has some restrictions for automated downloads?

Error Message

(pypsa-eur) projekt-resilient03@ubuntu-22-04-lts-temp:~/pypsa-eur$ python scripts/build_gas_input_locations.py
ERROR:root:Uncaught exception
Traceback (most recent call last):
  File "/home/projekt-resilient03/pypsa-eur/scripts/build_gas_input_locations.py", line 156, in <module>
    gas_input_locations = build_gas_input_locations(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/pypsa-eur/scripts/build_gas_input_locations.py", line 91, in build_gas_input_locations
    lng = build_gem_lng_data(gem_fn)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/pypsa-eur/scripts/build_gas_input_locations.py", line 28, in build_gem_lng_data
    df = pd.read_excel(fn, sheet_name="LNG terminals - data", engine='openpyxl')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 495, in read_excel
    io = ExcelFile(
         ^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 1567, in __init__
    self._reader = self._engines[engine](
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/excel/_openpyxl.py", line 553, in __init__
    super().__init__(
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 563, in __init__
    self.handles = get_handle(
                   ^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/common.py", line 728, in get_handle
    ioargs = _get_filepath_or_buffer(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/common.py", line 384, in _get_filepath_or_buffer
    with urlopen(req_info) as req:
         ^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/common.py", line 289, in urlopen
    return urllib.request.urlopen(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Related:

https://github.com/PyPSA/pypsa-eur/issues/1118

fhg-isi commented 3 days ago

Possible solution: mock browser headers:

def open_remote_excel_file(url, sheet_name):

  temp_xlsx_path = 'temp_dummy_file.xlsx'

  headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
  req = urllib.request.Request(url, headers=headers)
  with urllib.request.urlopen(req) as response:
      with open(temp_xlsx_path, "wb") as output_file:
          output_file.write(response.read())  

  df = pd.read_excel(temp_xlsx_path, sheet_name=sheet_name)
  os.remove(temp_xlsx_path)
  return df