Open dankovacek opened 5 months ago
Hi Dan,
There are a few different topics here - I'll try to answer them separately.
Regarding test and validation data:
I prepared a small sample dataset for the measurement application and added it to the repository. It's purely illustrative (5x5 cells with two marked outlets), but I think it can be helpful in preparing data for the application and verifying that everything works as expected.
Of course I can provide you with the datasets I used for the measurements presented in the paper. But I'd rather not put them in this repository, as the files are quite large.
Regarding georeferencing info:
You're right, the application ignores georeferencing information from the input file. From the beginning, this project was focused solely on algorithmic efficiency - the source code provided in this repository is essentially designed to measure and compare the performance of different algorithms. While maintaining georeferencing would naturally be useful in actual hydrological analysis, it simply didn't seem relevant to this research.
So while I agree this change could be useful for some applications, I'd prefer to keep the code here exactly as it was when I used it to prepare the results for the publication.
Regarding issues with Whitebox data:
Unfortunately, without analyzing the dataset you used, I can only guess what the problem might be. However, I have a few comments that I hope will be helpful.
My implementations use a single-flow direction format similar to that described in the ArcGIS documentation. As you noted, this convention is rotated by one position (relative to the Whitebox format). I don't really know how Whitebox calculates flow direction (I don't have much experience with it), but I've had some surprises with exporting data from other tools in the past. I'd start searching for the problem by manually looking at the flow direction data around the locations where the discrepancies appear - especially since some of the cases you described seem to involve only a small number of cells.
In general, all five algorithms from the measurement application should produce exactly the same output as each other (for the same input data). If there are any discrepancies between the output rasters produced by different algorithms, it means that some fundamental assumption has been broken. For example, given incorrect input data (e.g. with looping flow paths), particular algorithms may "fail" in different ways.
I'm curious if it's possible to test some sub-sample of the original data to try to replicating the results?
I have processed a small set of outlets on a (hydraulically conditioned) flow direction raster processed using Whitebox. I ran the single buffer (GPU) method on the dataset and am not able to validate results.
First, an offering -- I added a function to apply the affine transform from the original raster:
I call this after the saveGDAL function at the end of main.cpp:
I would be happy to create a pull request if this is something you feel is worth incorporating.
I tested the code on a flow direction raster processed (and hydraulically conditioned) using Whiteboxtools, noting and adjusting for the different flow direction convention (integer direction representation is rotated one position clockwise).
In creating the outlet.txt file, I also adjusted the outlet positions to reflect 1-index as noted in your (helpful!) comments.
I tested a small set of points (green dots are outlets:
Looking closer at the largest sub-basin, the blue polygon was derived independently using WhiteboxTools. The black region underneath is the result of the single buffer method:
This is close enough to make me think I've made an error in indexing the outlet cell correctly (missing the small region at the top). Looking at another point shows the basin is missed:
Again suggesting the outlet point is slightly off. If I don't correct for the d8 pointer direction convention or the 1 vs. 0 indexing, the result is (slightly) different but similarly not capturing the basin:
If I change the methodology to 1 -- recursive / sequential (all else constant):
Methods 1, 3, and 5 yielded nearly identical results.
Method 2 ran for several minutes and I killed the process.
Any thoughts on where I should look to find my issue?
Thanks for posting this code -- I enjoyed your paper and had fun getting this up and running.