SciNim / scinim

The core types and functions of the SciNim ecosystem
MIT License
37 stars 3 forks source link

Avoid LAPACK dependency for Savitzky Golay filter #11

Open Vindaar opened 2 years ago

Vindaar commented 2 years ago

In ggplotnim we've seen multiple times already that users have issues with the added LAPACK dependency introduced from geom_smooth (the fact that it shouldn't end up as a dep. in the first place if the user doesn't use geom_smooth is a different issue of course).

Ideally, we would use an alternative implementation to solve the least squares problem here: https://github.com/SciNim/scinim/blob/main/scinim/signals/filters.nim#L29

(If performance allows, then fully replace it, otherwise fall back if LAPACK is not specified)

Investigate if we can use @c-blake's fitl: https://github.com/c-blake/fitl for this purpose.

c-blake commented 2 years ago

Happy to help use it, if I can. The command line driver is pretty fancy. To just get estimated coefs you should be able to just set up the SVD and do the b[] loop - something like:

discard svdx(u, s, v, n, m)     # SVD LHS of X.b=y design eq
for k in 0 ..< m:               # Calc best fit coefs `b`
  b[k] = sum0(j,m, sum0(i,n, v[k + m*j]*s[j]*u[i + n*j]*y[i]))

(EDIT: update a per https://github.com/c-blake/fitl/commit/268d3b4b9543f215a17f9f8c0379e5ebb691cf12 and https://github.com/c-blake/fitl/commit/8f03e5cee25954050220de723a7536bb7aa319d7)

I was leaving that not some separate API call a) partially due to not having any users and b) to remind myself to maybe do a double-sum/triple-sum/.. template/macro - regular one was finicky enough for now. :-) This kind of sum notation is only slightly more explicit than the Einstein summation convention, but that is probably the right balance for many actual programs (maybe not all, though).

More debatable in terms of clarity balance might be index calculations vs [k,j]. Even there, the explicitness makes it easier (for me, anyway) to reason about cache performance which works much better (10X?) over dense back-to-back striding vs. large scale striding (in e.g. the dot product hot loop of svdx). BW use of cache line loads alone might be 64B/4B=16X for float32, though that may only matter for out of L1 problems which may not matter for your Savitzky-Golay. Personally, I find it not so mind-melting if you use consistent letter conventions for i 0..<n, j 0..<m|p, etc., but it's definitely more thought than just tossing on a cap-Sigma_{i,k} equivalent.

This code has a legacy in C dating back to circa y2k. If you think LAPACK/BLAS integration is a pain now...oh boy. Lol. Well, maybe diversity breeds complexity and it's even worse today. If "integration pain" were easy to quantify then a lot of things would be different. ;-)

Vindaar commented 2 years ago

Thanks for your input!

I had a look a few days ago, but couldn't fully get it to work. I tried on a simple example with a few points and got the correct result. But once I tried to use it for one of the examples used in the ggplotnim repo, I only got a bunch of NaNs. Didn't have time to fully investigate that yet. Essentially I need to compute the SVD of a big Vandermonde matrix (in the example O(6 x ~2400).

I'll get back to you once I've done some more testing.