Deltares / hydromt

HydroMT: Automated and reproducible model building and analysis
https://deltares.github.io/hydromt/
MIT License
75 stars 30 forks source link

HydroMT should work without internet #1069

Open hboisgon opened 2 months ago

hboisgon commented 2 months ago

HydroMT version checks

Reproducible Example

Maybe it's two errors in one but somehow even with a local data catalog and local data HydroMT somehow still tries to access the predefined catalogs which are stored online.

Not fully reproducible example but this should pop up when trying to build a wflow model from a shapefile without internet access:

hydromt build wflow wflow_Hikurangi_byXiao -r "{'basin': 'basin.shp'}" -i ./wflow_build.yml -d data_catalog.yml --fo -vvv

The config file

setup_basemaps:
  hydrography_fn: merit_hydro   # source hydrography data {merit_hydro, merit_hydro_1k}
  basin_index_fn: merit_hydro_index # source of basin index corresponding to hydrography_fn
  upscale_method: ihu           # upscaling method for flow direction data, by default 'ihu'
  res: 0.00833           # build the model at a 30 arc sec (~1km) resolution

Local copy of artifact_data or deltares_data should be able to replace the rest.

Current behaviour

2024-09-26 09:28:55,286 - build - log - DEBUG - Writing log messages to new file D:\wflow\Training_11thSep2024_HydroMT\hydromt\wflow_Hikurangi_byXiao\hydromt.log.
2024-09-26 09:28:55,286 - build - log - INFO - HydroMT version: 0.10.0
2024-09-26 09:28:55,287 - build - main - INFO - Building instance of wflow model at D:\wflow\Training_11thSep2024_HydroMT\hydromt\wflow_Hikurangi_byXiao.
2024-09-26 09:28:55,287 - build - main - INFO - User settings:
2024-09-26 09:28:55,333 - build - data_catalog - INFO - Parsing data catalog from ../data/northland_data_extract/data_catalog.yml
2024-09-26 09:28:55,347 - build - model_api - WARNING - Model dir already exists and files might be overwritten: D:\wflow\Training_11thSep2024_HydroMT\hydromt\wflow_Hikurangi_byXiao\staticgeoms.
2024-09-26 09:28:55,356 - build - model_api - WARNING - Model dir already exists and files might be overwritten: D:\wflow\Training_11thSep2024_HydroMT\hydromt\wflow_Hikurangi_byXiao\run_default.
2024-09-26 09:28:55,358 - build - model_api - INFO - Initializing wflow model from hydromt_wflow (v0.6.0).
2024-09-26 09:28:55,358 - build - data_catalog - INFO - Parsing data catalog from C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt_wflow\data\parameters_data.yml
2024-09-26 09:28:55,369 - build - model_api - DEBUG - Setting model config options.
2024-09-26 09:28:55,372 - build - model_api - DEBUG - Default config read from C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt_wflow\data\wflow\wflow_sbm.toml
2024-09-26 09:28:55,372 - build - model_api - INFO - setup_basemaps.region: {'basin': 'HikurangiScope_reproject.shp'}
2024-09-26 09:28:55,372 - build - model_api - INFO - setup_basemaps.hydrography_fn: merit_hydro
2024-09-26 09:28:55,372 - build - model_api - INFO - setup_basemaps.basin_index_fn: merit_hydro_index
2024-09-26 09:28:55,372 - build - model_api - INFO - setup_basemaps.res: 0.0041666
2024-09-26 09:28:55,372 - build - model_api - INFO - setup_basemaps.upscale_method: ihu
2024-09-26 09:28:55,372 - build - wflow - INFO - Preparing base hydrography basemaps.
2024-09-26 09:28:55,375 - build - rasterdataset - INFO - Reading merit_hydro raster data from D:\wflow\Training_11thSep2024_HydroMT\data\northland_data_extract\merit_hydro\{variable}.tif
2024-09-26 09:28:55,531 - build - main - ERROR - HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /Deltares/hydromt/main/data/catalogs/artifact_data/registry.txt (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000002CDCB2A4650>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)"))
Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\connection.py", line 203, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\util\connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\socket.py", line 962, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 11004] getaddrinfo failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\connectionpool.py", line 790, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\connectionpool.py", line 491, in _make_request
    raise new_e
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\connectionpool.py", line 467, in _make_request
    self._validate_conn(conn)
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\connectionpool.py", line 1096, in _validate_conn
    conn.connect()
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\connection.py", line 611, in connect
    self.sock = sock = self._new_conn()
                       ^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\connection.py", line 210, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x000002CDCB2A4650>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\urllib3\util\retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /Deltares/hydromt/main/data/catalogs/artifact_data/registry.txt (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000002CDCB2A4650>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\cli\main.py", line 224, in build
    mod.build(region, opt=opt)
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\models\model_api.py", line 246, in build
    self._run_log_method(method, **kwargs)
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\models\model_api.py", line 188, in _run_log_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt_wflow\wflow.py", line 232, in setup_basemaps
    kind, region = hydromt.workflows.parse_region(region, logger=self.logger)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\workflows\basin_mask.py", line 168, in parse_region
    kwarg = _parse_region_value(value0, data_catalog=data_catalog)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\workflows\basin_mask.py", line 206, in _parse_region_value
    geom = data_catalog.get_geodataframe(value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\data_catalog.py", line 1375, in get_geodataframe
    if str(data_like) in self.sources:
                         ^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\data_catalog.py", line 149, in sources
    self.from_predefined_catalogs(self._fallback_lib)
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\data_catalog.py", line 639, in from_predefined_catalogs
    catalog_path = self.predefined_catalogs[name].get_catalog_file(version)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\predefined_catalog.py", line 170, in get_catalog_file
    version = self.versions[-1]
              ^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\predefined_catalog.py", line 108, in versions
    self._versions = self._get_versions()
                     ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\predefined_catalog.py", line 122, in _get_versions
    keys = self.registry.keys()
           ^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\predefined_catalog.py", line 94, in registry
    return self.pooch.registry
           ^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\predefined_catalog.py", line 101, in pooch
    self._load_registry_file()
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\predefined_catalog.py", line 143, in _load_registry_file
    _copyfile(f"{self.base_url}/registry.txt", registry_path)
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\hydromt\data_adapter\caching.py", line 37, in _copyfile
    with requests.get(src, stream=True) as r:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\requests\api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\hydromt-wflow\Lib\site-packages\requests\adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /Deltares/hydromt/main/data/catalogs/artifact_data/registry.txt (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000002CDCB2A4650>: Failed to resolve 'raw.githubusercontent.com' ([Errno 11004] getaddrinfo failed)"))

Desired behaviour

If local files are used there should be no need to access the internet.

Additional context

The person was running from China where access to github is not guaranteed. I could be that as long as you have already downloaded and cached the predefined catalogs this solves the issue but worth checking that hydromt does work when fully offline.