Closed nialov closed 8 months ago
Ready for review. The main function, distance_to_anomaly
, works on all platforms while distance_to_anomaly_gdal
only works on Linux and MacOS. Based on initial tests I thought the GDAL
utility functions would work on all platforms but found out later they are inconsistent on Windows. The main problem on Windows is the lack of osgeo_utils
module in the conda
environment.
Anyway, thought it best to keep the GDAL
method for later reference. I just marked the corresponding test as an xfail
when run on Windows. It is orders of magnitude faster and at least somewhat more memory-efficient (filtering to anomalous values is still done in-memory).
If distance_computation
, which the main function relies on, requires optimization later, you should check out the original branch by @tomironkko here: https://github.com/GispoCoding/eis_toolkit/compare/master...209-add-distance-to-anomaly. The "raster" distance math approach is more efficient but it can obviously only handle raster data while the current distance_computation
handles vector data as well. You could rasterize first and then use raster math. Alternatively the missing osgeo_utils
could be investigated.
I did some testing with the non-gdal version using various raster files and using a randomly generated anomaly profile. Everything has looked good so far :+1: I'm wondering, though, if the function should handle nodata values in some more sophisticated way, or is it ok to assume that the user has replaced nodata values with nan for the input raster?
I did some testing with the non-gdal version using various raster files and using a randomly generated anomaly profile. Everything has looked good so far 👍 I'm wondering, though, if the function should handle nodata values in some more sophisticated way, or is it ok to assume that the user has replaced nodata values with nan for the input raster?
Yes, you are correct that if np.nan
is not the true nodata value, then the nodata values will be left among the possible anomaly values. I have personally tried to advocate for a centralized approach to handling of nodata but if there are no current plans for such approach, I can implement nodata handling in this function as well (@nmaarnio)?
Ready for review, let me know about the nodata handling!
Let's implement nodata handling in this tool now. However, if you have time and interest to suggest/implement a centralized approach for nodata handling across the toolkit in the upcoming weeks/months, please go ahead.
Nodata handling implemented and ready for review again. Nodata only affects the definition of anomalous values in _fits_criteria
. The output will never have nodata values in it.
Thanks for the review, @nmaarnio! Ready for review again 👍
Ready for review/merging 👍 @nmaarnio, @em-t
Looks good!
Merging!
Performance issues are a problem which can be solved by using the alternative functionality that uses gdal_proximity function instead of existing distance_computation implementation. The GDAL implementation is orders of magnitude faster.
Draft for now until I clean up the implementation.
@tomironkko made an initial draft of this function which might have been better than the current
shapely
distance calculation that uses the existingdistance_computation
one but more than likely much worse than thegdal_proximity
method.