fpagliughi / sockpp

Modern C++ socket library.
BSD 3-Clause "New" or "Revised" License
782 stars 126 forks source link

EINTR and interrupted system calls #24

Closed snej closed 1 year ago

snej commented 4 years ago

A characteristic of earlier UNIX systems was that if a process caught a singal while the process was blocked in a "slow" system call, the system call was interrupted. The system call returned an error and errno was set to EINTR. ... The problem with interrupted system calls is that we now have to handle the error return explicitly.

—Stevens & Rago, Advanced Programming In The UNIX Environment, 3rd ed., sec. 10.5

This is annoying because you have to wrap some calls in a do...while loop, and confusing because the behavior has changed over time, is different on different platforms, and there are workarounds to make it less annoying that don't (to me) seem to help.

It appears that the calls used by sockpp that are affected are connect, accept, send, recv. The man pages for these all list EINTR among the possible errors. All these need to be wrapped like:

int result;
do {
    result = SYSTEMCALL();
} while (result < 0 && last_error_code() == EINTR);

I have no idea whether any of this applies to Windows. I don't think Windows even has the notion of "signals" in the same sense as Unix. (@borrrden ?)

fpagliughi commented 4 years ago

If you did this, then a ^C wouldn't be able to get you out of the application, right? Nor a SIGTERM from an orderly shutdown. Or the Linux timeout command (which is very handy in an embedded Linux system). You would be forced to do a SIGKILL, I'm guessing.

We can investigate, but I can't help but notice the lead to that paragraph... "A characteristic of earlier UNIX systems..."

Both "earlier" and "Unix". I wonder how may people would use this lib on UNIX (as opposed to Linux) and an early Unix system at that. Maybe circa 1984? :-)

snej commented 4 years ago

I am absolutely not an expert on signals, but I'm pretty sure that a signal that terminates the process will still terminate it even if a thread is blocked in a system call. So SIGINT and SIGTERM should still stop a process.

I think the reason not to loop on SIGINT is that it seems to be the only reliable way to interrupt a blocking I/O call. So there might be clients who want to use signals for that purpose, even though it seems pretty heavy-handed to me, kind of like stomping on the floor to get a record player un-stuck.

So maybe this could be a per-socket flag, i.e. have a stream_socket::set_interruptible(bool) method to configure this behavior, with the default being false? And there'd be an internal method that wraps the do...while loop above after checking that flag.

snej commented 4 years ago

BTW, I just remembered that I've seen this interruption behavior in our project — I once hit a breakpoint on one thread while a different thread was connecting to a [slow] server, and when I resumed from the breakpoint, the connect call immediately failed with SIGINT.

fpagliughi commented 4 years ago

Yeah. Maybe if you don't catch SIGINT/SIGTERM they still terminate? Can't remember; would need to run a test.

But then the decision would be... should we handle this in the library, or should the library retain the lower-level behavior? If nothing else, if we add a flag, should the low-level behavior be the default?

You know if we do add this, someone, at some point will post an issue claiming that they're sending their process a SIGTERM, but it's not returning from a system call!

Definitely a deciding factor for me would be if this helped make application code more portable between different targets. I honestly have no idea how non *nix systems act.

borrrden commented 4 years ago

You are correct in your analysis that Windows does not use signals in C. I am not an expert in this area so I don't know by what mechanism they handle this.

fpagliughi commented 4 years ago

So... I was going to give the speech about how this library is intended to be mostly low-level and efficient... nearly as efficient as the C API in aggregate... and that a *nix programmer should know about EINTR and handle it properly...

And then I looked at the implementation of readn() and writen() in the library, and I totally forgot to handle EINTR!

D'Oh!