Open nucccc opened 1 year ago
Very nice! It's going to be until the weekend until I can comment in detail.
I see this is a very minimal cython port. That's great!
I'm curious what happens if you drop
c = distance.cdist(exp_data, num_data, metric='minkowski', p=p)
and use cython to create this matrix? It might be slightly faster to calculate the distance when we need it, rather than creating the large matrix?
Also, we may want to show performance improvements with respect to both the number of data, and the number of dimensions in the data.
After your suggestion, I even modified the cython code for a hypothetized improvement by eliminating not the distances matrix
c = distance.cdist(exp_data, num_data, metric='minkowski', p=p)
but rather the ca
matrix of the coefficients. That didn't look to me as a great speed improvement, but may remove from ram an additional matrix. All calculations are done inplace on the original distance matrix c
, thus modifying it. That one can be seem as a procedural change, and I still to reflect on that, but may lead to a decrease in RAM occupation. I hope it to be possibly useful.
I forgot to mention that as a case I'm actually working just on the frechet distance, but of course the cythonization can be then extended on all the other functions.
Thank you for the attention and the work, Nuc
Just a bump. I'm still very interested in this comparison, I've just been swamped lately!
Nice bump adding dtw! I need to test this out asap.
Ah yes, i just added a dtw comparison. Then i think the benchmark could be better organized, I would put some more work in it.
I'm swamped too and thank you for the attention!
Also if you ever need a quick voice discussion just let me know and we can keep in touch via google meet or something at a point, I'm based in Italy and right now I'm flexible with my schedule.
I seem to be getting about a 40% improvement gain, that is not sensitive to the total amount of data. This is good! I think I should consider porting the entire library to cython. It would be nice to have both a cython and python version side by side, since the python version is slightly easier to read. Not sure what is the best way to have both...
1.0000510139013212
1.0000510139013212
True
[0.981130201, 1.0351228799999999, 1.0592268329999999, 1.0405540450000004, 1.066770408]
[0.594745455, 0.5694208449999998, 0.5682200880000003, 0.5850636710000003, 0.5942479000000009]
average execution time for non cythonized version: 1.036561
average execution time for cythonized version: 0.582340
improvement: 43 %
100.00171783203773
100.00171783203773
True
[0.9917487250000008, 1.0106956060000005, 0.984660250000001, 1.0517639240000012, 0.9957187320000003]
[0.6305159309999997, 0.6164782420000012, 0.6147694729999991, 0.6011433679999989, 0.6316231439999989]
average execution time for non cythonized version: 1.006917
average execution time for cythonized version: 0.618906
improvement: 38 %
Hi, in this period I could be a little bit busy, but I could find some time. Recently I red that Cython 3.0 was released, which uses heavily type hinting, and one could resort to type hints such as cython.int or cython.long to actually have types, I would give a look if one could could write pythonic code and then let Cython 3.0 transpile it with that performance gain.
Good evening, I provide in here a folder with an example of cython code and a benchmark, so that if you want to think about it you can find some material from which to start.
Inside the
simylaritymeasures
folder, there will be acy_simylaritymeasures
, in which i left only the cython code, scripts and quick documentation inreadme.md
to build it and run it.Hoping it can turn out to be of any utility, Nuc