3dgeo-heidelberg / py4dgeo

py4dgeo - A Python library for change analysis in 4D point clouds
https://py4dgeo.readthedocs.io
MIT License
73 stars 10 forks source link

Output indices_distances of _py4dgeo.region_growing not equal to distances of neighborhood_similarity #223

Closed DCHulskemper closed 1 year ago

DCHulskemper commented 1 year ago

Issue: In segmentation.py the dtw distances within the initial neighborhood (similarities) computed in the seed_sorting_scorefunction should be equal to the objdata.indices_distances of the initial neighborhood points computed in _py4dgeo.region_growing. This is not the case. The objdata.indices_distances for the neighborhood points are all zero instead and the distances outside the neighborhood seem too small.

Solution: The problem could be that in the c++ function ( _py4dgeo.region_growing) the dtw distance computation is done using the full time series instead of only the temporal segment between end_epoch and start_epoch of seed. This could explain why the DTW distances in similarities and objdata.indices_distances are not equal, but this does not necessarily explain why the objdata.indices_distances in the neighborhood are zero.

dokempf commented 1 year ago

I read a bit through the code and indeed it seems like something seems wrong. I am unsure whether it is a regression that I introduced at some point or whether it always has been wrong (which would be slightly embarrassing). On the bright side: I am fairly confident that the problem is on the Python side. The restriction to (start_epoch, end_epoch) is not done on the C++ side, instead the required restriction is done on the calling Python side by slicing numpy arrays:

There might be more issues, but this immediately seemed dubious to me.

kathapand commented 1 year ago

I tested the additional slicing in https://github.com/3dgeo-heidelberg/py4dgeo/blob/main/src/py4dgeo/segmentation.py#L744. When added, the program crashed at this function, though.

Without the slicing, we would expect the distances between compared time series to be larger (than in the limited timespan). So I assume the return of 0.0 is a separate issue.