flame / blis

BLAS-like Library Instantiation Software Framework
Other
2.29k stars 366 forks source link

Feature request for BLIS error handling... #479

Open jdiamondGitHub opened 3 years ago

jdiamondGitHub commented 3 years ago

Blis currently handles user errors by printing an error message to the console and then signaling an abort. This is reasonable in a situation where a programmer is calling blis directly. However, we like to use blis inside our own libraries, which are then part of applications, some even being called by other languages and frameworks as behind the scenes acceleration. So when the blis error happens, (1) there is no console to see the error, and (2), we can't hook the abort, so not only does blis go down, but the entire application framework crashes, being all part of the same user process. Since this is a mission critical business application, we can't allow the server to potentially crash.

Our feature request would be to provide an alternate error mode in blis that would allow some method to return an error to the blis calling code so it can be dealt with by the higher level code utilizing blis and allow the system to continue on. A poor man's approach would be to provide an error function callback that we could use to implement our own global error checking system. This would put the most work on us, but would have the advantage that everyone could define whatever error reporting worked best for them.

An even better solution (from our point of view) would be to provide some kind of standard non blocking way for an application to check after a blis call that an error has occurred and see what the error was. The cleanest way would be to return an error code from the BLIS call, but that would change the API, so a more awkward version would be to provide a second function that you call that determines if the blis call succeeded, and if not, what went wrong.

Thanks for your thoughts on what kind of alternate error system would be most generally useful to the community.

fgvanzee commented 3 years ago

Thanks for your input Jeff. I'll start thinking seriously about how best to accommodate your application's needs.

So far, I'm partial to using proper error return codes from user-level functions, even though (a) it will take more work and (b) it will break the API. The breakage doesn't scare me because it only affects the return values. But practically speaking, this may not be the best route. I'll begin assessing how much work it would take.

@devinamatthews What happens when a program that calls func() is compiled according to a prototype that suggests it returns void, but is then linked to an implementation that actually returns an integer? Is the integer return value merely ignored?

devinamatthews commented 3 years ago

@fgvanzee it's harmless. The return value will either be in a (callee-owned) register or on the callee stack, which the calling code will ignore either way. The ABI is also the unaffected.

fgvanzee commented 3 years ago

I think I've come up with a feasible plan to overhaul the way errors are handled that will both preserve the status quo (as an option) as well as provide Jeff with his preferred solution.

Under the changes I envision, BLIS will provide two options, both of which can be changed at runtime (with the initial default for each set at configure-time):