Integration of Accelerated Flow Direction Algorithms

fernando-aristizabal commented 3 years ago

Hello Dr. Tarboton, Would there be an interest in integrating the accelerated flow direction algorithms developed by Survila et al (2016) into this repository? If so, I believe the accelerated O(n) algorithms are already implemented on the CyberGIS repository with a slight modification within a fork to allow for compilation. I'd be happy to try to help integrate this if desired since it is heavily used at the NWC within the Cahaba repository.

Thanks, Fernando Aristizabal

References: Survila, K., Yildirim, A.A., Li, T., Liu, Y., Tarboton, and D., Wang, S. 2016. “A Scalable High-performance Topographic Flow Direction Algorithm for Hydrological Information Analysis”. Proceedings of the 2016 Annual Conference on Extreme Science and Engineering Discovery Environment (XSEDE'16). July 17-21. Miami, Florida. accepted.

dtarb commented 3 years ago

Fernando,

I've been time limited but would live to give this a try. Perhaps the first thing to do is get some of the Survila code back up and running. I have not looked at this in a while, but as I recall in work at the time the paper was being written I was not getting the same answers as with existing code, and I did not have time to track down the differences. We could start some comparisons, using some test data from https://github.com/dtarb/TauDEM-Test-Data.

fernando-aristizabal commented 3 years ago

Dr. Tarboton, Thanks for the reply and for the test suite. I'll be out of the office next week but will take a look in January.

Enjoy your holidays, Fernando Aristizabal

fernando-aristizabal commented 3 years ago

Hello Dr. Tarboton, A brief update on some progress so far. I've tested both the TauDEM repository (dev branch) and the accelerated algorithms on the CyberGIS repo using the procedure found on a fork of the testing repo. All 227 tests on the TauDEM repo passed testing as expected. However, 15 tests failed on the CyberGIS repo which were due to the peukerdouglas, d8hdisttostrm, dinfupdependence, slopeavedown, areadinf, and gagewatershed utilities. I'll keep digging around to see if any of these failures may have been caused in anyway by d8flowdir or dinfflowdir (which both passed). I may checkout a few more datasets too.

Best, Fernando Aristizabal

dtarb commented 3 years ago

Thanks @fernando-aristizabal. Does your testing compare the output. Comparing files named ang.tif and p.tif across cases is important to make sure flow directions from each approach are the same. Sometimes due to rounding and numerics there are small differences and these are OK, so a bit of judgment may need to be exercised in evaluating differences.

fernando-aristizabal commented 3 years ago

Dr. Tarboton, Another status update. Thanks for hinting that the asset_success functionality only tests for successful completion of the utilities and not for the quality of the output.

To address this, I've implemented an experiment within the feature branch I previously mentioned of the TauDEM test repository. This experiment compares the outputs of d8flowdir and dinfflowdir to reference within both repositories (TauDEM and CyberGIS) across 3 datasets (ReferenceResult/Base/loganfel.tif, ReferenceResult/Base/logan.tif, ReferenceResult/Geographic/enogeo.tif) with MPI processes 1-4 tested. This brings the total number of experiments to 2*2*3*4=48. The outputs are compared to the respective p, an g, sd8, or slp files corresponding to each of the three input datasets within the ReferenceResult dir. I only compared pixel array values to reference ignoring any metadata or projection differences. To counter precision errors, I set the absolute tolerance for all pixel values to 1e-4 which is especially useful in the dinf case.

My results conclude that the only discrepancies are witnessed for the p or ang with none seen for the sd8 and slp files. For the accelerated CyberGIS repository, the dinfflowdir and d8flowdir utilities failed for the logan.tif input across all MPI process numbers used. For the TauDEM dev branch, the dinfflowdir utility failed for all 4 MPI process numbers for the enogeofel.tif input.

Analyzing the results visually for the case of the logan.tif input with the accelerated utilities, one notices that the differences in the p and ang outputs when compared to reference yields differences that are correlated spatially with areas that have been pit filled.

For the case with enogeofel.tif input on the TauDEM dev repo with dinfflowdir, I'm seeing precise discrepancies in about 10.4% of the roughly 603,640 data pixels. Of the pixels with precise discrepancies, the minimum error is -3.97e-4 radians, maximum error is 6.28 radians, mean error is 1.00e-4 radians, median error is 3.73e-9, and standard deviation of error is 2.51e-2. Taking the absolute value of the precise errors yields only 1 error greater than or equal to 1e-3 and only 21 errors greater than or equal to 1e-6. Overall there seems to be good agreement with only a few small outliers.

Please advise if you see any issues currently with my testing. I will continue to look into these discrepancies with further detail especially the ones related to logan.tif. Step #5 details in the README.md details how to run the experiment above. Everything should be reproducible.

Best, Fernando Aristizabal

dtarb / TauDEM

Integration of Accelerated Flow Direction Algorithms #222