Closed ArbelAmir closed 4 years ago
Hi Arbel Amir,
Thank you for the code and comment. I think that people would be both interested in both optimizing memory useage and run time of Fréchet distance calculations. Any algorithms to do so would end up being a fairly significant contribution and probable a popular paper.
I don't get the RecursionError
with max_len=1000
on my system with Python 3.6. Part of the problem with this is the default limit in Python is small for these calculations.
If my memory recalls, the discrete Fréchet distance is O(nm), where n is the length of a, and m is the length of b. So you can adjust the default recursion limit to get around this https://docs.python.org/3/library/sys.html#sys.setrecursionlimit , but then the calculation will take sometime.
What were you thinking on trying?
Thanks, CJ
Is it easy to rewrite the algorithms iteratively?
The iteration code that solves this problem
def dp_frechet_dist(P, Q, pnorm=2):
""" dynamic programming for discrete frechet distance """
n = len(P)
m = len(Q)
ca = np.ones((n, m))
ca = np.multiply(ca, -1)
ca[0, 0] = minkowski_distance(P[0], Q[0], p=pnorm)
for i in range(1, n):
ca[i, 0] = max(ca[i-1, 0], minkowski_distance(P[i], Q[0], p=pnorm))
for j in range(1, m):
ca[0, j] = max(ca[0, j-1], minkowski_distance(P[0], Q[j], p=pnorm))
for i in range(1, n):
for j in range(1,m):
ca[i, j] = max(min(ca[i-1, j], ca[i, j-1], ca[i-1, j-1]), minkowski_distance(P[i], Q[j], p=pnorm))
return ca[n-1, m-1]
Discrete Frechet distance is essentially a DP problem that should best be coded with iteration. The original code uses ( if ca[i, j] > -1: return ca[i, j] ) in the recursive function to avoid the computation of finished nodes, which is a somewhat ok solution but leads to the stack-overflow problem for recursive functions. And in practice iteration is a little bit faster since we reduce the burden of funciton calling : )
Thanks @SenZHANG-GitHub for the simple function call that follows the current style of the library!
Okay so the iterative code and the recursive code are both O(nm).
You are welcome : ) Yes they are both O(nm) w.r.t. the number of distance calculation and max/min functions.
Quick study compared the performance of the recursion code to the dynamic programming (DP) code. Results showed that DP was about 1.2 times faster for five dimensional data with 1000 points on each curve (on my machine). Here's the notebook I used to come up with these numbers: https://github.com/cjekel/similarity_measures/blob/master/frechet_distance_recursion_vs_dp.ipynb
The DP algorithm also avoids Python's maximum recursion depth issues, so I think it makes sense to switch the Frechet implementation.
@SenZHANG-GitHub do you want to send a PR with this DP code replaces the recursive code (and you remove the _c function)? You can change the following docstring part to say thanks to yourself :)
"""
Python has a default limit to the amount of recursive calls a single
function can make. If you have a large dataset, you may need to increase
this limit. Check out the following resources.
https://docs.python.org/3/library/sys.html#sys.setrecursionlimit
https://stackoverflow.com/questions/3323001/what-is-the-maximum-recursion-depth-in-python-and-how-to-increase-it
Thanks to MaxBareiss
https://gist.github.com/MaxBareiss/ba2f9441d9455b56fbc9
This sets a global variable named pnorm, where pnorm = p.
"""
I think this issue will bring some nice changes, so thanks @ArbelAmir and @SenZHANG-GitHub !
Hi Charles, I'm indeed glad that it helps and yes I removed _c since it is used for the recursive function calling. Just a "thanks to" in the docstring part would be great. Thanks!
https://github.com/cjekel/similarity_measures/commit/c83c58d75725745373d52d3d487d3f5c5c32d68b now uses the iterative DP code for the Frechet distance thanks to @SenZHANG-GitHub
Going to push to PyPi shortly.
Consider the followings:
While running this code on a 32GB RAM machine it raises a stack-overflow error. I would suggest to switch the recursion based computations to iterative based computations using Queue's.
Is anyone currently working on optimizing the memory usage of frechet_dist ?
Thank you for your work, Arbel Amir