Should errors be reported programatically instead of/as well as being logged

0x00002a commented 3 years ago

This is more of a design question that I just wanted to put out there. Currently the library generally deals with errors by logging them with a user-provided logger and moving on. The use of user-provided loggers allows suppressing or custom reporting of the errors in a customisable way. However, it does not let the user of the library take action if a problem occures, such as being unable to connect (in which case the user may want to try again, for example).

One way to report errors would be through C++'s built-in method of exceptions. I like exceptions personally but I think this is not a great place for them because A) They do not mesh well with async stuff yet (although coroutines will hopefully improve that) and B) Some things are not neccarsarily exceptional for the user code, a server might frequently timeout for example.

Personally I very much like the spdlog intergration and I don't think that should be removed. But I also think there needs to be some way the user can handle errors. The future/promise route works well imo, but it is a single-use channel, so for stuff like server::router thats no good. Another option would be a C# like event that would allow registering multiple callbacks be fired on call. Then we could attach the logging code to said event, thus allowing the replacement of code which currently logs to instead fire the event, and have the user able to add their own handlers independently.

As I said, I'm hoping for this to be somewhere we can discuss this, if there is already a policy in place I'm not aware of I apologise :p

Stuff that could benifit from this:

It is worth noting that the client-side currently has quite a bit of error feedback currently. http_request gives an future<error_code> and the callback for make_websocket_connection may get invoked on an error (but some are just logged), however the use of std::future for the reporting mechanism is somewhat problematic because it can't be waited on asynchronously afaik. I think boost::asio::awaitable could solve this and I will be investigating at some point

Tectu commented 3 years ago

I have been thinking about this for a long while - pretty much when starting out conceiving this library. I did not come up with a definitive design yet as for the various problems you mentioned.

I agree that the spdlog integration is neat. However, for the future I'd prefer to have a generic logging interface defined by malloy and then provide spdlog as a built-in sink for that interface.

I don't mind using exceptions as much as some other C++ developers do. But I do see problems with using exceptions to handle problems within this library. You outlined it pretty well. Based on that so far my favorite solution would be implementing an error-code & message interface, similar to boost::beast::error_code. This seems like the most flexible approach. The thing most important for me here is consistency. At this point I'm not even sure whether we should roll our own error reporting tooling from scratch or whether we should build something on top of std::error_code. If the latter is interesting, we'd have to figure out how to relay boost::beast::error_code as from the top of my head I seem to recall that it is not based on std::error_code.

0x00002a commented 3 years ago

std::error_code compatability was one of my pain points with beast and inherited from asio iirc. I think part of the issue is with the categories they use, which are not compatable with std::error_code (but thats mostly a guess).

I like the way beast and asio handles them, with the codes passed to the callback functions or as out params for sync versions, but it still leaves the issue of how to report on the server side. I think we also need to question if we want to report (setup/routing) errors on the server side. iirc neither flask nor django offer anything but logging for errors that occure before the handler is invoked, I assume because theres nothing that could be done about it from the users perspective. It might be worth considering adding an add_error_page(http::status, std::function<response<string_body>()>) or something in the future though if we go this route.

0x00002a commented 3 years ago

Just some more info for this discussion, after some further coroutine research. The asio docs have this to say about coroutine versions of async functions (emphasis mine):

Where an asynchronous operation's handler signature has the form:
void handler(boost::system::error_code ec, result_type result);
the resulting type of the co_await expression is result_type. In the async_read_some example above, this is size_t. If the asynchronous operation fails, the error_code is converted into a system_error exception and thrown. [..]

(docs)

Based on my understanding of coroutines, this will mean the exception is propagated upward until it hits a non-coroutine function at which point it is effectively "unwrapped" (like how future/promise pairs work with promise::set_exception). This seems rather tidy, and solves the issue of exceptions + async code. However, it does not address the issue of forcing the user to treat what may be expected behaviour as "exceptional", or handle error reporting on the server side (if we care about that).

However, if we went the error code route it would also prevent switching to coroutines as currently supported by asio without either some exception catching layer or some major breaking changes. Also compiler support is still patchy (iirc latest msvc only has the ts, and clang hasn't implemented it fully. gcc has though).

Going with this would also require a reshuffle of both connection types too, and it would lock out the option of going with error_code in the future or providing overloads for them easily. The reshuffle hopefully wouldn't be too major since most of the handler methods can still be invoked "manually" though.

Finally, yet another option could be to catch the exception thrown by asio and wrap it in our own type like either<error_code, ValueT> but that is rather messy imo.

Tectu commented 3 years ago

Personally, I'd want to wait with migrating "everyting" to co-routines anyway. I looked into this when starting to write this library and compiler support seemed less-than-ideal.

I am still thinking about this pretty much every day. I still haven't made up my mind but I do think that I'd tend towards creating a custom error reporting interface (something similar to std::error_code) and then provide the necessary glue to convert from beast::error_code.

What is your current, overall "feeling" about this situation?

0x00002a commented 3 years ago

I agree with the waiting on coroutines. Compiler support is soso and it would be a pretty major change.

I like the custom error reporting interface in theory, but I'm not clear on what form that would take. My general feeling though is that we should be using callbacks or overwise chainable things for error reporting, and try and avoid stuff that requires blocking like std::future. Also if we use error codes it needs to be really hard to ignore them, if it can't be impossible.

Tectu commented 3 years ago

I like the custom error reporting interface in theory, but I'm not clear on what form that would take.

Me neither - otherwise this would have been done long before you joined in :p

My general feeling though is that we should be using callbacks or overwise chainable things for error reporting, and try and avoid stuff that requires blocking like std::future.

I fully agree.

Also if we use error codes it needs to be really hard to ignore them, if it can't be impossible.

C++ is not the language where a library author necessarily has to prevent the user from shooting himself in the foot. I like to quote:

As we all know, the First Amendment to the C++ Standard states: "The committee shall make no rule that prevents C++ programmers from shooting themselves in the foot." Speaking less facetiously, when it comes to choosing between giving programmers more control and saving them from their own carelessness, C++ tends to err on the side of giving more control.

Anyway, I have been thinking about this some more. I think that we should design the error interface in a way that it's also extendible. For example, I have two applications (consuming malloy) which implement some wrappers around http::request and http::response to facilitate easy working with RestAPIs (this will hopefully end up in malloy at one point). As these wrappers do some extra parsing (eg. to parse the JSON bodies, handling API error codes (part of the JSON responses) etc) they also need a way of reporting errors. It would be very beneficial if the upcoming error reporting/handling interface would allow for these use cases to extend on those types.

0x00002a commented 3 years ago

I've been thinking about this some more, and I think it would be helpful to categorise the types of errors we are going to have to have to communicate, based on how they can/could be handled by user code.

1. Stuff which the user cannot actually do anything about, apart from maybe log it themselves

imo this is stuff like errors with the server side before it hits the handler. Even those might fall into (3) though. This should just be logged in a user-customisable fashion and dealt with internally by malloy.

2. General issues which may or may not be classed as errors by the user

e.g. failing to close a websocket because the other end did it already, or failing to resolve an endpoint. These are probably going to be most of the errors and should be the focus when thinking about how to report imo.

My current thoughts on this is we should use error codes and callbacks. This should cover most bases, with the exceptions (scuse the pun) being ctors and dtors. Alternatively it might be possible to roll our own asio chainable std::future which could be used instead and only pass error codes via callbacks when there is already a callback for taking a result (e.g. websocket::send would keep callback, but websocket::accept would just return a chainable future with an error code). I believe there is a standard library proposal for this but I don't know the status of it.

3. Critical/fatal errors which the user may be able to deal with/work around

These are errors which stop malloy in its tracks and/or would cause wonky behaviour if ignored. e.g. failing to open the keyfile for an ssl keychain which the user expects the server to use. If it fails, the user may be left wondering why malloy won't accept connections. imo these should be straight up impossible to ignore or at least very difficult. I agree with this:

C++ is not the language where a library author necessarily has to prevent the user from shooting himself in the foot

but I would prefer to have a system that at least warned the user that the gun was live in the first place :p (to strech the metaphore a little).

Reporting of these should be done via exceptions imo. They are the built-in mechanism for reporting hard errors and are difficult/impossible to ignore unless one wants to. One of the issues with this is using it with async code, however I think the current use cases for this type of error are entirely synchronous code (startup stuff and preconditon checking). Examples of stuff which should throw (and currently return bool): controller::init, controller::init_tls, router::add_*. The tls stuff already throws if the context has not been inited

4. Stuff that may bubble up through malloy from user code

Since we use callbacks for a lot of things, it seems likely that malloy will have to handle some exceptions that escape these. We could just say "no" to them escaping and do an std::terminate if they do, or wrap it in a response of internal server error or any number of things really. This is less of a prioritory imo since it is entirely in the users control to stop it happening.

All of this is of course just my opinion :p.

Thoughts?

Tectu commented 3 years ago

Well, I couldn't agree more. This is a very good write-up of my thoughts of the last few weeks/months. Thank you for going through the trouble of writing this out :p

One thing I would like to specifically mention tho: I am strongly against calling std::terminate anywhere from within the library - at least not without giving the user the option to catch the error before (as a default fallback that might be fine). But then again: This aligns exactly with what you stated - so I am all good on this :)

0x00002a commented 3 years ago

So can I start work on converting some of the current bool functions to exceptions? I think the error code stuff needs more thinking about, and the more I look into a chainable future the more I keep hitting coroutines :p. It should be possible to define our own CompletionToken for asio that creates a coroutine but also bundles an error code rather than throwing, but documentation on CompletionToken is sparse to none (as is standard for asio *sigh*)

Tectu / malloy