Why choose to not use numpy?

jonasvdd commented 2 years ago

Hi really cool repo 🚀

I am 100% sure that this this code can be significantly sped up by using numpy vectorisation.

I was able to write a pure numpy-python implementation which could downsample 50,000,000->2,000 data points in ~300ms (on a consumer pc).

If interested; I can quickly make a PR where I put the numpy-python version of that algorithm next to yours.

Cheers, Jonas

sfc-gh-puneet-lakhanpal commented 1 year ago

Hi @jonasvdd , are you able to submit the PR for the numpy vectorized version ?

Thank you.

jonasvdd commented 1 year ago

Sure!

jonasvdd commented 1 year ago

Just a few questions:


def largest_triangle_three_buckets(data, threshold):
    """
    Return a downsampled version of data.
    Parameters
    ----------
    data: list of lists/tuples
        data must be formated this way: [[x,y], [x,y], [x,y], ...]
                                    or: [(x,y), (x,y), (x,y), ...]
    threshold: int
        threshold must be >= 2 and <= to the len of data
    Returns
    -------
    data, but downsampled using threshold

Why did you opt of a list of tuples as datatype? (not really performant when operating on large arrays). If you are considering numpy support, I would opt to just allow an array-alike type argument (and thus splitting up the data argument into x and y), e.g.:
```
def largest_triangle_three_buckets(x, y, threshold):
"""
....

Parameters:
------------------
x: list | np.ndarray
   ....
```



How should I add the numpy dependency? (via `requirements.txt` / `pyproject.toml`)

devoxi / lttb-py

Why choose to not use numpy? #4