Closed barneydobson closed 2 months ago
Oh, dang! Yeah, GPLv3 code cannot be shipped with MIT. You either have to change the license or go with pyflwdir
. You can try the thessian method that you mentioned.
I was thinking that we might be able to take out the conditioning part of pysheds
and refactor it to a single function, but even a derivative of GPLv3 cannot be released under MIT.
However, it seems that you can get permission from the developer of pysheds
to use his package and still keep the MIT license. So, I recommend reaching out to him and discussing the issue. Who knows, he might even change it to MIT 😄
I mean considering #213 I think there is still a desire to move away from pysheds
in general. I'll have a look and see if I can't get it to work with pyflwdir
OK, I thought you're happy with pysheds
😄 In that case, moving to use whitebox
is a good choice after doing some side-by-side comparison. I think since it has been used more extensively than pysheds
, it should give better results. But it'd be nice to see the difference in conditioning among the three options on a map, i.e., cell-level differences.
The results from whitebox.breach_depressions
seem qualitatively very similar to pysheds.fill_pits
, fill_depressions
and resolve_flats
. It's pyflwdir
that seems way different (with a bias in the diagonal direction - I think we discussed it ages ago) - I don't know if there's a user error on my end - as far as I can tell I am doing all of them exactly by the tutorial.
Can you please try it with the fill_depression
function that I wrote based on pyflwdir
? I made some improvements, mainly performance-wise, and refactored it as a stand-alone function that accepts xr.DataArray
. In my experience, using min
or edges
for the outlet
, can make a difference.
Yep will have a look - probably on Tuesday though
@cheginit
Seems to have a similar issue (at least in this use case) as pyflwdir
- that is directionality bias. See plots below:
whitebox
(good)Not shown because I've removed it now, but qualitatively looks similar to pysheds
pyflwdir
(not good)py3dep
(also not good)There's every chance I've used both py3dep
and pyflwdir
wrong - so please feel free to debug at your leisure.
The code is as follows, you will have to install swmmanywhere
from branch #266 . Here is the demo_graph
.
import geopandas as gpd
import networkx as nx
import numpy as np
import pandas as pd
import rasterio as rst
from swmmanywhere.geospatial_utilities import (
load_and_process_dem,
delineate_catchment_pyflwdir,
reproject_raster
)
from swmmanywhere.graph_utilities import load_graph
from swmmanywhere.prepare_data import download_elevation
from pathlib import Path
from py3dep.utils import fill_depressions
output_dir = Path(r'C:\Users\bdobson\Downloads\test')
demo_graph = load_graph(output_dir / 'graph.json')
x = nx.get_node_attributes(demo_graph, 'x')
y = nx.get_node_attributes(demo_graph, 'y')
df = pd.DataFrame({'x': x, 'y': y})
geom = gpd.points_from_xy(df.x, df.y)
gdf = gpd.GeoDataFrame(df, geometry = geom, crs= demo_graph.graph['crs'])
gdf = gdf.to_crs(4326)
gdf['x'] = gdf.geometry.x
gdf['y'] = gdf.geometry.y
x_min, y_min = gdf[['x','y']].min(axis=0)
x_max, y_max = gdf[['x','y']].max(axis=0)
output_dir = Path(output_dir)
bounds = (x_min, y_min, x_max, y_max)
download_elevation(output_dir / "dem_.tif", bounds)
dem_fid = output_dir / "dem.tif"
reproject_raster(demo_graph.graph['crs'], output_dir / "dem_.tif", dem_fid)
grid, fdir_wb, _ = load_and_process_dem(dem_fid, method='whitebox')
_, fdir_pf, _ = load_and_process_dem(dem_fid, method='pyflwdir')
with rst.open(dem_fid) as src:
dem = src.read(1)
fdir_p3 = fill_depressions(dem)
manhole_subs_wb = delineate_catchment_pyflwdir(grid, fdir_wb, demo_graph)
manhole_subs_pf = delineate_catchment_pyflwdir(grid, fdir_pf, demo_graph)
manhole_subs_p3 = delineate_catchment_pyflwdir(grid, fdir_p3, demo_graph)
manhole_subs_wb.to_file(output_dir / 'manhole_subs_wb.geojson')
manhole_subs_pf.to_file(output_dir / 'manhole_subs_pf.geojson')
manhole_subs_p3.to_file(output_dir / 'manhole_subs_p3.geojson')
I see. So, there are a couple of issues.
py3dep.fill_depression
only fills the depressions and does not return flow direction, so you cannot directly use its output to generate basins. I just modified the function to also return it, so we can compare.bbox = gdf.buffer(20 * 30).to_crs(4326).total_bounds
dem = download_elevation(bbox)
dem = dem.rio.reproject(gdf.crs)
dem = dem.where(dem > dem.rio.nodata)
dem = dem.rio.write_nodata(np.nan)
dem = dem.rio.clip_box(*gdf.total_bounds)
dfill, fdir_pf = py3dep.fill_depressions(dem, outlets="edge", return_d8=True)
whitebox
you're using breaching instead of filling, so the two algorithms that you're using for conditioning are not the same. Generally, breaching is less destructive, meaning that changes made to the input DEM is less than that of depression filling. But depression filling is faster, which is why it's used at larger scales.py3dep
is better than what you got with pyflwdir
, probably due to the no data issue stuff:
Overall, I agree that whitebox
is generating better results since breaching is being used, which, unfortunately, pyflwdir
doesn't have. So, I recommend that you take care of null stuff regardless of the conditioning methodology.
sounds good - thanks for the detailed look.
The CRS was just for speed of reproducability in this example. In swmmanywhere
everything is converted to UTM after the initial preprocessing step - so all calculations are all in UTM. I realise that was dumb of me to have it the other way round in the example above ;)
As far as I can see in the codebase, you're not taking care of this issue neither in the reproject_raster
nor in the download_elevation
, though.
What I meant was that the output of download_elevation
is a dem at 4326 (non-project CRS) while all your other computations are in UTM (a projected CRS). So, at the end of the day, you are reprojecting the downloaded dem to UTM which ends up producing artificial nulls. My workaround is to get the dem data for a buffered bbox so once you reproject and clip it to the original bounds, the artificial nulls get clipped and the final reprojected dem ends up not having nulls. I recommend looking at your intermedia results in the workflow to see if that's not the case.
Something like this should take care of the issue:
bbox = gdf.buffer(20*30).to_crs(4326).total_bounds
dem = download_elevation(bbox)
dem = dem.rio.reproject(gdf.crs).fillna(dem.rio.nodata)
dem = dem.rio.clip_box(*gdf.total_bounds).astype("int16")
dem.rio.to_raster("dem.tif")
EDIT: BTW, with this DEM as input, I used your function to get flow dir using whitebox
and I got different results than yours. I am not sure if clipping is the only factor.
Oh I misunderstood - yes I had noticed that problem when fiddling with the subbasin delineation.
I will make a separate issue for it. #269
OK it could be effects of the buffer there for why whitebox
is producing different results I suppose.
You can try with the DEM method that I mentioned and see if you get the same results.
I got curious and when I looked into the whitebox
python package, I found a couple of issues. But fixing those issues meant rewriting the whole thing 😄 So, I decided to write a WhiteboxTools wrapper myself the way I like it 🤓
This wrapper takes care of getting the Rust binaries and provides a runner.
import platform
import requests
import zipfile
import shutil
import tempfile
import stat
from pathlib import Path
import subprocess
def _download_wbt(wbt_root: str | Path = "WBT") -> None:
"""Download the WhiteboxTools executable for the current platform."""
base_url = "https://www.whiteboxgeo.com/WBT_{}/WhiteboxTools_{}.zip"
system = platform.system()
if system not in ("Windows", "Darwin", "Linux"):
raise ValueError(f"Unsupported operating system: {system}")
if system == "Windows":
platform_suffix = "win_amd64"
elif system == "Darwin":
if platform.machine() == "arm64":
platform_suffix = "darwin_m_series"
else:
platform_suffix = "darwin_amd64"
elif system == "Linux":
if "musl" in platform.libc_ver()[0].lower():
platform_suffix = "linux_musl"
else:
platform_suffix = "linux_amd64"
url = base_url.format(system, platform_suffix)
wbt_root = Path(wbt_root)
exe_name = "whitebox_tools.exe" if platform.system() == "Windows" else "whitebox_tools"
if (wbt_root / exe_name).exists():
shutil.rmtree(wbt_root)
with tempfile.TemporaryDirectory() as temp_dir:
temp_path = Path(temp_dir)
zip_path = temp_path / "whitebox_tools.zip"
try:
response = requests.get(url)
response.raise_for_status()
zip_path.write_bytes(response.content)
except requests.exceptions.RequestException as e:
raise RuntimeError(f"Failed to download WhiteboxTools: {e!s}") from e
try:
with zipfile.ZipFile(zip_path, "r") as zip_ref:
for file_info in zip_ref.infolist():
if "/WBT/" in file_info.filename:
extracted_path = Path(
*Path(file_info.filename).parts[
Path(file_info.filename).parts.index("WBT") + 1 :
]
)
if extracted_path.parts:
source = zip_ref.extract(file_info, path=temp_dir)
dest = wbt_root / extracted_path
dest.parent.mkdir(parents=True, exist_ok=True)
shutil.move(source, dest)
except zipfile.BadZipFile:
raise RuntimeError("Downloaded file is not a valid zip file.")
# Set executable permissions for non-Windows platforms
if system != "Windows":
executable_names = ["whitebox_tools", "whitebox_runner"]
for exec_name in executable_names:
exec_path = wbt_root / exec_name
if exec_path.exists():
exec_path.chmod(
exec_path.stat().st_mode | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH
)
def whitebox_tools(
tool_name: str,
args: list[str] | tuple[str, ...],
wbt_root: str | Path = "WBT",
work_dir: str | Path = "",
verbose: bool = False,
compress_rasters: bool = False,
refresh_download: bool = False,
) -> None:
"""Run a WhiteboxTools (not Whitebox Runner) tool with specified arguments.
Parameters
----------
tool_name : str
Name of the WhiteboxTools to run.
args : list or tuple
List of arguments for the tool.
wbt_root : str or Path, optional
Path to the root directory containing the Whitebox executables (default is "WBT").
work_dir : str or Path, optional
Working directory for the tool (default is current directory).
verbose : bool, optional
Whether to print verbose output (default is False).
compress_rasters : bool, optional
Whether to compress output rasters (default is False).
refresh_download : bool, optional
Whether to refresh the download if WhiteboxTools is found (default is False).
Raises
------
subprocess.CalledProcessError
If the tool execution fails.
Exception
For any other unexpected errors.
Notes
-----
This function will run the specified WhiteboxTools tool and handle its output.
If verbose is True, all output will be printed.
"""
wbt_root = Path(wbt_root)
work_dir = Path(work_dir) if work_dir else Path.cwd()
exe_name = "whitebox_tools.exe" if platform.system() == "Windows" else "whitebox_tools"
exe_path = wbt_root / exe_name
if not exe_path.exists() or refresh_download:
_download_wbt(wbt_root)
command = [
str(exe_path),
f"--run={tool_name}",
f"--wd={work_dir}",
"-v" if verbose else "-v=false",
f"--compress_rasters={'true' if compress_rasters else 'false'}",
]
command.extend(args)
if verbose:
print(" ".join(map(str, command)))
try:
process = subprocess.run(command, check=True, text=True, capture_output=True)
if verbose:
print(process.stdout)
except subprocess.CalledProcessError as e:
print(f"Error output: {e.stdout}")
raise Exception(f"Error running tool: {e}")
except Exception as e:
raise Exception(f"Unexpected error: {e}")
Then, I was able to reproduce the whole delineation with only WhitboxTools.
import shapely
bbox = shapely.box(*dem.rio.bounds())
u, x, y = zip(
*[
(u, float(p["x"]), float(p["y"]))
for u, p in g.nodes(data=True)
if shapely.Point(p["x"], p["y"]).within(bbox)
]
)
gpd.GeoSeries(gpd.points_from_xy(x, y), crs=g.graph["crs"]).to_file("pour_pts.shp")
whitebox_tools("BreachDepressionsLeastCost", ["-i=dem.tif", "-o=dem_corr.tif"])
whitebox_tools("D8Pointer", ["-i=dem_corr.tif", "-o=fdir.tif"])
whitebox_tools("D8FlowAccumulation", ["-i=fdir.tif", "--pntr", "-o=streams.tif"])
whitebox_tools("JensonSnapPourPoints", ["--pour_pts=pour_pts.shp", "--streams=streams.tif", "--snap_dist=15.0", "-o=pour_pts_snapped.shp"])
whitebox_tools("Watershed", ["--d8_pntr=fdir.tif", "--pour_pts=pour_pts_snapped.shp", "-o=watersheds.tif"])
with rasterio.open("watersheds.tif") as src:
gdf_bas = vectorize(
src.read(1).astype(np.int32), 0, src.transform, src.crs, name="basin"
)
gdf_bas = gdf_bas[gdf_bas.basin != src.nodata]
gdf_bas["id"] = [u[x - 1] for x in gdf_bas["basin"]]
I highly recommend reading the hydrology part of the WhiteboxTools documentation here, it has lots of good stuff.
OMG amazing. tests pass, I will next test it on the cluster to see if there's any other issues - I will first update the PR with just the d8_pointer
bit (#266 ) - and then put the rest in an issue about removing pyflwdir
(#267 ).
Since it is a requirement... I guess this would make
SWMManywhere
have to be GNU no?ughhhhh - richdem, #137 is too..
pyflwdir
isn't, but I was thrilled with the conditioning results we were seeing from it.Or the really simple option would just be to do a thessian polygon type approach like in: https://doi.org/10.3390/w15010046