TEOS-10 / GSW-C

C implementation of the Thermodynamic Equation Of Seawater - 2010 (TEOS-10)
Other
18 stars 17 forks source link

Vectorized version of library #49

Closed egecetin closed 2 years ago

egecetin commented 2 years ago

GSW-C currently only uses scalar inputs. During the #29, the array inputs were discussed. Maybe, I can work on vectorization using an external library (OpenBLAS). Vectorization is mostly used for optimization reasons, OpenBLAS will be a dependency but the library will be more optimized. I think I can add it as an optional dependency, so anyone who wants to use the vectorized version should compile with OpenBLAS.

So compilation of GSW-C can create two different shared objects,

and two different API libraries,

with functions such as,

void gsw_alpha(double *sa, double *ct, double *p, double *output, const uint64_t length)

It will take quite some time so would you consider adding such an feature and external (optional) dependency to the library? Additionally I'm not familiar with other implementations of GSW (for example Fortran or MATLAB etc), is there any problem/conflict with other implementations?

richardsc commented 2 years ago

I don't have the C chops to really weigh in here, but to make sure this goes under the correct eyes I'll just tag @dankelley @ocefpaf @efiring @Alexander-Barth who are primary developers on the R/python/Julia packages which all wrap this repo.

dankelley commented 2 years ago

Function calls are cheap in C, so I don't see much advantage in the proposed vectorization, given that wrappers for R (and, I think, both Julia python) already vectorize.

The cheapness of function calls explains why the math library provides, for example, sin() in unvectorized form. I think the same carries over to the gsw-c functions.

Another reason I am against this proposal is that changing code can introduce bugs. And, even if that were not the case, code changes can make users wonder whether there are problems. If my model acts up, I like hoping that the problem is in my code, not in the code of a library I'm using.

Also, changing the function arguments to e.g. gsw_rho() will necessitate rewriting the existing wrappers, and also rewriting any user code that is written in C. That's unless the plan is to create all new functions, with non-conflicting names. (I am not sure on the exact plan, so take this comment with a grain of salt.)

These are some of the reasons why my vote would be not to do this vectorization. But I am not the author of gsw-c, and so my opinion ought to be down-weighted.

ocefpaf commented 2 years ago

I agree with @dankelley. I don't see a lot of advantages of that in the c-lib. Specially b/c, I may be wrong, folks are not really using the c-lib directly but rather the Python, Julia, and R wrappers. If one requires that kind of speed specialization maybe it would be worth to focus on a "pure" Julia version and/or bring the Fortran one up to speed with the c-lib.

Rambling a bit a have also have Haskell, C++, and Pascal implementations. Again, I may be wrong, but I don't see the point unless we (those who are wrapping the core gsw to languages scientists are using) adopt a new "base language" for gsw. Maybe the cpp since that would be relatively easy to swap and seems to be "officially" developed,

PS: Now I am really rambling and you should stop reading here. I'd love to see a Rust core that both R, Python, and Julia could use.

efiring commented 2 years ago

I think the vectorization inside the Python wrapper, based on the numpy ufunc machinery, is reasonably efficient, so it's not clear to me that much can be gained by vectorizing the C functions directly. Would you be replacing each "+", "*", etc. with a blas function call? That would be a huge amount of work, and might even slow things down. The easiest vectorization would be to put an explicit loop inside each function, but even that would be considerable work for little gain, I expect. And, I don't find the prospect of re-writing the Python wrappers, and maintaining the additional code, to be appealing. Fortran90 has vectorization built in; unfortunately, it is much harder to wrap in other languages, and simply impractical to use for multi-platform libraries. @egecetin I am wondering what is your motivation? Have you run into applications where speeding up the C library would eliminate a major bottleneck?

egecetin commented 2 years ago

@efiring BLAS function calls have templates like ax+y or alpha A + beta Y, so basically most of operations will be replaced.

Vectorization libraries like BLAS mostly faster than plain loops because compiler can translate to operations for wider bit operations (SSE, AVX) but if this has no meaning for other implementations i agree with you, too much work for gain.

castelao commented 2 years ago

You might enjoy knowing that there is a Rust implementation of GSW in progress. The initial motivation was to use it directly on microcontrollers to support autonomous decisions and ML onboard Argo and Spray underwater gliders. On top of unit tests, it is also validated against Matlab's reference dataset (v3.06.12) to guarantee consistency. Any library/application based on GSW-C can divert to GSW-rs without much effort through Foreign Function Interface (FFI) bindings. We actually test that using GSW-Python.

We welcome comments and questions over there.