Looks like `pysheds` is GNU???

barneydobson commented 2 months ago

Since it is a requirement... I guess this would make SWMManywhere have to be GNU no?

ughhhhh - richdem, #137 is too.. pyflwdir isn't, but I was thrilled with the conditioning results we were seeing from it.

Or the really simple option would just be to do a thessian polygon type approach like in: https://doi.org/10.3390/w15010046

cheginit commented 2 months ago

Oh, dang! Yeah, GPLv3 code cannot be shipped with MIT. You either have to change the license or go with pyflwdir. You can try the thessian method that you mentioned.

cheginit commented 2 months ago

I was thinking that we might be able to take out the conditioning part of pysheds and refactor it to a single function, but even a derivative of GPLv3 cannot be released under MIT.

However, it seems that you can get permission from the developer of pysheds to use his package and still keep the MIT license. So, I recommend reaching out to him and discussing the issue. Who knows, he might even change it to MIT 😄

barneydobson commented 2 months ago

I mean considering #213 I think there is still a desire to move away from pysheds in general. I'll have a look and see if I can't get it to work with pyflwdir

cheginit commented 2 months ago

OK, I thought you're happy with pysheds 😄 In that case, moving to use whitebox is a good choice after doing some side-by-side comparison. I think since it has been used more extensively than pysheds, it should give better results. But it'd be nice to see the difference in conditioning among the three options on a map, i.e., cell-level differences.

barneydobson commented 2 months ago

The results from whitebox.breach_depressions seem qualitatively very similar to pysheds.fill_pits, fill_depressions and resolve_flats. It's pyflwdir that seems way different (with a bias in the diagonal direction - I think we discussed it ages ago) - I don't know if there's a user error on my end - as far as I can tell I am doing all of them exactly by the tutorial.

cheginit commented 2 months ago

Can you please try it with the fill_depression function that I wrote based on pyflwdir? I made some improvements, mainly performance-wise, and refactored it as a stand-alone function that accepts xr.DataArray. In my experience, using min or edges for the outlet, can make a difference.

barneydobson commented 2 months ago

Yep will have a look - probably on Tuesday though

barneydobson commented 2 months ago

@cheginit

Seems to have a similar issue (at least in this use case) as pyflwdir - that is directionality bias. See plots below:

`whitebox` (good)

Not shown because I've removed it now, but qualitatively looks similar to pysheds

`pyflwdir` (not good)

`py3dep` (also not good)

Reproduce

There's every chance I've used both py3dep and pyflwdir wrong - so please feel free to debug at your leisure.

The code is as follows, you will have to install swmmanywhere from branch #266 . Here is the demo_graph.

import geopandas as gpd
import networkx as nx
import numpy as np
import pandas as pd
import rasterio as rst
from swmmanywhere.geospatial_utilities import (
    load_and_process_dem, 
    delineate_catchment_pyflwdir, 
    reproject_raster
)
from swmmanywhere.graph_utilities import load_graph
from swmmanywhere.prepare_data import download_elevation
from pathlib import Path
from py3dep.utils import fill_depressions

output_dir = Path(r'C:\Users\bdobson\Downloads\test')
demo_graph = load_graph(output_dir / 'graph.json')
x = nx.get_node_attributes(demo_graph, 'x')
y = nx.get_node_attributes(demo_graph, 'y')
df = pd.DataFrame({'x': x, 'y': y})
geom = gpd.points_from_xy(df.x, df.y)
gdf = gpd.GeoDataFrame(df, geometry = geom, crs= demo_graph.graph['crs'])
gdf = gdf.to_crs(4326)
gdf['x'] = gdf.geometry.x
gdf['y'] = gdf.geometry.y

x_min, y_min = gdf[['x','y']].min(axis=0)
x_max, y_max = gdf[['x','y']].max(axis=0)

output_dir = Path(output_dir)
bounds = (x_min, y_min, x_max, y_max)
download_elevation(output_dir / "dem_.tif", bounds)
dem_fid = output_dir / "dem.tif"
reproject_raster(demo_graph.graph['crs'], output_dir / "dem_.tif", dem_fid)
grid, fdir_wb, _ = load_and_process_dem(dem_fid, method='whitebox')
_, fdir_pf, _ = load_and_process_dem(dem_fid, method='pyflwdir')
with rst.open(dem_fid) as src:
    dem = src.read(1)
    fdir_p3 = fill_depressions(dem)

manhole_subs_wb = delineate_catchment_pyflwdir(grid, fdir_wb, demo_graph)
manhole_subs_pf = delineate_catchment_pyflwdir(grid, fdir_pf, demo_graph)
manhole_subs_p3 = delineate_catchment_pyflwdir(grid, fdir_p3, demo_graph)

manhole_subs_wb.to_file(output_dir / 'manhole_subs_wb.geojson')
manhole_subs_pf.to_file(output_dir / 'manhole_subs_pf.geojson')
manhole_subs_p3.to_file(output_dir / 'manhole_subs_p3.geojson')

cheginit commented 2 months ago

I see. So, there are a couple of issues.

The py3dep.fill_depression only fills the depressions and does not return flow direction, so you cannot directly use its output to generate basins. I just modified the function to also return it, so we can compare.
When getting DEM since you're reprojecting in post-processing from a non-projected CRS to a projected CRS, you will always end up with a skewed bounding box, thus generating artificial null cells. So you should always get the DEM for a larger bounding box (usually a multiple of the source resolution) so when you reproject and then clip to the original bounds, you don't generate artificial null cells. For example, this is what I did for this case:

bbox = gdf.buffer(20 * 30).to_crs(4326).total_bounds
dem = download_elevation(bbox)
dem = dem.rio.reproject(gdf.crs)
dem = dem.where(dem > dem.rio.nodata)
dem = dem.rio.write_nodata(np.nan)
dem = dem.rio.clip_box(*gdf.total_bounds)
dfill, fdir_pf = py3dep.fill_depressions(dem, outlets="edge", return_d8=True)

In your code when using whitebox you're using breaching instead of filling, so the two algorithms that you're using for conditioning are not the same. Generally, breaching is less destructive, meaning that changes made to the input DEM is less than that of depression filling. But depression filling is faster, which is why it's used at larger scales.
The results from py3dep is better than what you got with pyflwdir, probably due to the no data issue stuff:

Overall, I agree that whitebox is generating better results since breaching is being used, which, unfortunately, pyflwdir doesn't have. So, I recommend that you take care of null stuff regardless of the conditioning methodology.

barneydobson commented 2 months ago

sounds good - thanks for the detailed look.

The CRS was just for speed of reproducability in this example. In swmmanywhere everything is converted to UTM after the initial preprocessing step - so all calculations are all in UTM. I realise that was dumb of me to have it the other way round in the example above ;)

cheginit commented 2 months ago

As far as I can see in the codebase, you're not taking care of this issue neither in the reproject_raster nor in the download_elevation, though.

What I meant was that the output of download_elevation is a dem at 4326 (non-project CRS) while all your other computations are in UTM (a projected CRS). So, at the end of the day, you are reprojecting the downloaded dem to UTM which ends up producing artificial nulls. My workaround is to get the dem data for a buffered bbox so once you reproject and clip it to the original bounds, the artificial nulls get clipped and the final reprojected dem ends up not having nulls. I recommend looking at your intermedia results in the workflow to see if that's not the case.

Something like this should take care of the issue:

bbox = gdf.buffer(20*30).to_crs(4326).total_bounds
dem = download_elevation(bbox)
dem = dem.rio.reproject(gdf.crs).fillna(dem.rio.nodata)
dem = dem.rio.clip_box(*gdf.total_bounds).astype("int16")
dem.rio.to_raster("dem.tif")

EDIT: BTW, with this DEM as input, I used your function to get flow dir using whitebox and I got different results than yours. I am not sure if clipping is the only factor.

barneydobson commented 2 months ago

Oh I misunderstood - yes I had noticed that problem when fiddling with the subbasin delineation.

I will make a separate issue for it. #269

OK it could be effects of the buffer there for why whitebox is producing different results I suppose.

cheginit commented 2 months ago

You can try with the DEM method that I mentioned and see if you get the same results.

cheginit commented 2 months ago

I got curious and when I looked into the whitebox python package, I found a couple of issues. But fixing those issues meant rewriting the whole thing 😄 So, I decided to write a WhiteboxTools wrapper myself the way I like it 🤓

This wrapper takes care of getting the Rust binaries and provides a runner.

import platform
import requests
import zipfile
import shutil
import tempfile
import stat
from pathlib import Path
import subprocess

def _download_wbt(wbt_root: str | Path = "WBT") -> None:
    """Download the WhiteboxTools executable for the current platform."""
    base_url = "https://www.whiteboxgeo.com/WBT_{}/WhiteboxTools_{}.zip"

    system = platform.system()
    if system not in ("Windows", "Darwin", "Linux"):
        raise ValueError(f"Unsupported operating system: {system}")

    if system == "Windows":
        platform_suffix = "win_amd64"
    elif system == "Darwin":
        if platform.machine() == "arm64":
            platform_suffix = "darwin_m_series"
        else:
            platform_suffix = "darwin_amd64"
    elif system == "Linux":
        if "musl" in platform.libc_ver()[0].lower():
            platform_suffix = "linux_musl"
        else:
            platform_suffix = "linux_amd64"

    url = base_url.format(system, platform_suffix)
    wbt_root = Path(wbt_root)

    exe_name = "whitebox_tools.exe" if platform.system() == "Windows" else "whitebox_tools"
    if (wbt_root / exe_name).exists():
        shutil.rmtree(wbt_root)

    with tempfile.TemporaryDirectory() as temp_dir:
        temp_path = Path(temp_dir)
        zip_path = temp_path / "whitebox_tools.zip"

        try:
            response = requests.get(url)
            response.raise_for_status()
            zip_path.write_bytes(response.content)
        except requests.exceptions.RequestException as e:
            raise RuntimeError(f"Failed to download WhiteboxTools: {e!s}") from e

        try:
            with zipfile.ZipFile(zip_path, "r") as zip_ref:
                for file_info in zip_ref.infolist():
                    if "/WBT/" in file_info.filename:
                        extracted_path = Path(
                            *Path(file_info.filename).parts[
                                Path(file_info.filename).parts.index("WBT") + 1 :
                            ]
                        )
                        if extracted_path.parts:
                            source = zip_ref.extract(file_info, path=temp_dir)
                            dest = wbt_root / extracted_path
                            dest.parent.mkdir(parents=True, exist_ok=True)
                            shutil.move(source, dest)
        except zipfile.BadZipFile:
            raise RuntimeError("Downloaded file is not a valid zip file.")

    # Set executable permissions for non-Windows platforms
    if system != "Windows":
        executable_names = ["whitebox_tools", "whitebox_runner"]
        for exec_name in executable_names:
            exec_path = wbt_root / exec_name
            if exec_path.exists():
                exec_path.chmod(
                    exec_path.stat().st_mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH
                )

def whitebox_tools(
    tool_name: str,
    args: list[str] | tuple[str, ...],
    wbt_root: str | Path = "WBT",
    work_dir: str | Path = "",
    verbose: bool = False,
    compress_rasters: bool = False,
    refresh_download: bool = False,
) -> None:
    """Run a WhiteboxTools (not Whitebox Runner) tool with specified arguments.

    Parameters
    ----------
    tool_name : str
        Name of the WhiteboxTools to run.
    args : list or tuple
        List of arguments for the tool.
    wbt_root : str or Path, optional
        Path to the root directory containing the Whitebox executables (default is "WBT").
    work_dir : str or Path, optional
        Working directory for the tool (default is current directory).
    verbose : bool, optional
        Whether to print verbose output (default is False).
    compress_rasters : bool, optional
        Whether to compress output rasters (default is False).
    refresh_download : bool, optional
        Whether to refresh the download if WhiteboxTools is found (default is False).

    Raises
    ------
    subprocess.CalledProcessError
        If the tool execution fails.
    Exception
        For any other unexpected errors.

    Notes
    -----
    This function will run the specified WhiteboxTools tool and handle its output.
    If verbose is True, all output will be printed.
    """
    wbt_root = Path(wbt_root)
    work_dir = Path(work_dir) if work_dir else Path.cwd()

    exe_name = "whitebox_tools.exe" if platform.system() == "Windows" else "whitebox_tools"
    exe_path = wbt_root / exe_name

    if not exe_path.exists() or refresh_download:
        _download_wbt(wbt_root)

    command = [
        str(exe_path),
        f"--run={tool_name}",
        f"--wd={work_dir}",
        "-v" if verbose else "-v=false",
        f"--compress_rasters={'true' if compress_rasters else 'false'}",
    ]
    command.extend(args)

    if verbose:
        print(" ".join(map(str, command)))

    try:
        process = subprocess.run(command, check=True, text=True, capture_output=True)
        if verbose:
            print(process.stdout)
    except subprocess.CalledProcessError as e:
        print(f"Error output: {e.stdout}")
        raise Exception(f"Error running tool: {e}")
    except Exception as e:
        raise Exception(f"Unexpected error: {e}")

Then, I was able to reproduce the whole delineation with only WhitboxTools.

import shapely

bbox = shapely.box(*dem.rio.bounds())
u, x, y = zip(
    *[
        (u, float(p["x"]), float(p["y"]))
        for u, p in g.nodes(data=True)
        if shapely.Point(p["x"], p["y"]).within(bbox)
    ]
)
gpd.GeoSeries(gpd.points_from_xy(x, y), crs=g.graph["crs"]).to_file("pour_pts.shp")

whitebox_tools("BreachDepressionsLeastCost", ["-i=dem.tif", "-o=dem_corr.tif"])
whitebox_tools("D8Pointer", ["-i=dem_corr.tif", "-o=fdir.tif"])
whitebox_tools("D8FlowAccumulation", ["-i=fdir.tif", "--pntr", "-o=streams.tif"])
whitebox_tools("JensonSnapPourPoints", ["--pour_pts=pour_pts.shp", "--streams=streams.tif", "--snap_dist=15.0", "-o=pour_pts_snapped.shp"])
whitebox_tools("Watershed", ["--d8_pntr=fdir.tif", "--pour_pts=pour_pts_snapped.shp", "-o=watersheds.tif"])

with rasterio.open("watersheds.tif") as src:
    gdf_bas = vectorize(
        src.read(1).astype(np.int32), 0, src.transform, src.crs, name="basin"
    )
    gdf_bas = gdf_bas[gdf_bas.basin != src.nodata]
    gdf_bas["id"] = [u[x - 1] for x in gdf_bas["basin"]]

I highly recommend reading the hydrology part of the WhiteboxTools documentation here, it has lots of good stuff.

barneydobson commented 2 months ago

OMG amazing. tests pass, I will next test it on the cluster to see if there's any other issues - I will first update the PR with just the d8_pointer bit (#266 ) - and then put the rest in an issue about removing pyflwdir (#267 ).

ImperialCollegeLondon / SWMManywhere