Open jonasvdd opened 2 years ago
Hi @jonasvdd , are you able to submit the PR for the numpy vectorized version ?
Thank you.
Sure!
Just a few questions:
def largest_triangle_three_buckets(data, threshold):
"""
Return a downsampled version of data.
Parameters
----------
data: list of lists/tuples
data must be formated this way: [[x,y], [x,y], [x,y], ...]
or: [(x,y), (x,y), (x,y), ...]
threshold: int
threshold must be >= 2 and <= to the len of data
Returns
-------
data, but downsampled using threshold
Why did you opt of a list
of tuples as datatype? (not really performant when operating on large arrays). If you are considering numpy support, I would opt to just allow an array-alike type argument (and thus splitting up the data argument into x
and y
), e.g.:
def largest_triangle_three_buckets(x, y, threshold):
"""
....
Parameters:
------------------
x: list | np.ndarray
....
How should I add the numpy dependency? (via `requirements.txt` / `pyproject.toml`)
Hi really cool repo 🚀
I am 100% sure that this this code can be significantly sped up by using numpy vectorisation.
I was able to write a pure numpy-python implementation which could downsample 50,000,000->2,000 data points in ~300ms (on a consumer pc).
If interested; I can quickly make a PR where I put the numpy-python version of that algorithm next to yours.
Cheers, Jonas