Rewrite benchmarks with `perfplot` charts

ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

https://ashvardanian.com/posts/simsimd-faster-scipy/

Apache License 2.0

988 stars 59 forks source link

Rewrite benchmarks with `perfplot` charts #193

Closed ashvardanian closed 1 month ago

ashvardanian commented 1 month ago

Describe what you are looking for

Current benchmarking suite is quite flexible and accurate, but its output can be improved to include dynamic perfplot charts.

Can you contribute to the implementation?

[X] I can contribute

Is your feature request specific to a certain interface?

Python bindings

Contact Details

No response

Is there an existing issue for this?

[X] I have searched the existing issues

Code of Conduct

[X] I agree to follow this project's Code of Conduct

jimthompson5802 commented 1 month ago

@ashvardanian I started working on this. I'd like to confirm the scope of this work is to modify bench.py only.

It appears benchmark.py is affected by this Issue #194. As a temporary work-around, I've modified code like this from

            (
                "scipy.cosine",
                lambda A, B: spd.cdist(A, B, "cosine"),
                lambda A, B: simd.cdist(A, B, "cosine"),
                [np.float32, np.float16, np.int8],
            ),

            (
                "scipy.cosine",
                lambda A, B: spd.cdist(A, B, "cosine"),
                lambda A, B: simd.cdist(A, B, metric="cosine"),
                [np.float32, np.float16, np.int8],
            ),

jimthompson5802 commented 1 month ago

@ashvardanian a few more clarifying questions.

Is the the intent of perfplot integration to generate a plot for each combination of Datatype and Method?
for each plot what should be the x-axis? In the sample from perfplot, the x-axis is the size of the vector. From the perspective of simsimd, there appears to be two possible variables that could serve as the x-axis:
```
parser.add_argument("--n", type=int, default=1000, help="Number of vectors (default: 1000)")
parser.add_argument("--ndim", type=int, default=1536, help="Number of dimensions (default: 1536)")
```

ashvardanian commented 1 month ago

Hi @jimthompson5802! Frankly, I am not sure, what's the cleanest way to implement this. The current benchmark seems quite convoluted and poorly documented, but accomplishes many things and covers a lot of functionality.

Maybe we should split it into multiple parts. You probably have no less experience with charts and visualizations than I do, so I trust your judgement on various approaches and we can iterate together once a draft is ready 🤗

As for --ndim vs --n, having 2 variables is clear path to 3d charts... Most of the time those are unreadable, but maybe this can be an exception. Alternatively, showing the dependence on --ndim is much more important than --n.

Thanks for help!