devoxi / lttb-py

Largest-Triangle-Three-Buckets (LTTB) downsampling algorithm in Python
MIT License
82 stars 19 forks source link

Why choose to not use numpy? #4

Open jonasvdd opened 2 years ago

jonasvdd commented 2 years ago

Hi really cool repo 🚀

I am 100% sure that this this code can be significantly sped up by using numpy vectorisation.

I was able to write a pure numpy-python implementation which could downsample 50,000,000->2,000 data points in ~300ms (on a consumer pc).

If interested; I can quickly make a PR where I put the numpy-python version of that algorithm next to yours.

Cheers, Jonas

sfc-gh-puneet-lakhanpal commented 1 year ago

Hi @jonasvdd , are you able to submit the PR for the numpy vectorized version ?

Thank you.

jonasvdd commented 1 year ago

Sure!

jonasvdd commented 1 year ago

Just a few questions:


def largest_triangle_three_buckets(data, threshold):
    """
    Return a downsampled version of data.
    Parameters
    ----------
    data: list of lists/tuples
        data must be formated this way: [[x,y], [x,y], [x,y], ...]
                                    or: [(x,y), (x,y), (x,y), ...]
    threshold: int
        threshold must be >= 2 and <= to the len of data
    Returns
    -------
    data, but downsampled using threshold


How should I add the numpy dependency? (via `requirements.txt` / `pyproject.toml`)