last_error property is not thread-safe

snej commented 5 years ago

The implementation of the socket class's last_error property is not thread-safe. The error code is copied from errno into a member variable after a system call. However, the error really needs to be tracked per thread, as errno itself is.

This is significant because it's not uncommon to have one thread writing to a socket and another thread reading from it — this is a typical way to implement WebSockets, for example. What can happen in this case is:

Thread A calls write.
The underlying send system call fails, and the error is copied from errno into the socket's lastErr_.
Thread B calls read.
The underlying recv system call succeeds, so lastErr_ is set to 0.
Thread A sees that the write call returned -1, and calls get_last_error.
get_last_error returns 0, not the actual error from the write.

snej commented 5 years ago

I believe the appropriate fix is to declare lastErr_ as thread_local. But I'm not sure thread local is supported on all platforms.

fpagliughi commented 5 years ago

Yes, you're correct: socket objects are not thread-safe. But I never claimed they were! ...merely that it was safe to transfer them across threads using move semantics.

But you're also right in that it's a very common use case to share a client socket in separate read and write threads, which is perfectly legal with the underlying OS socket handle. That's the main use case I see for using this library myself in the future, so I'm invested in making it work.

First, though, we must assume that errno is thread-safe across all platforms. That's true in Linux/Posix, but I have no idea about other platforms. It would be very unfortunate if it weren't.

The way I assumed that we would handle sharing a socket handle across threads would be by making a clone of the socket for other thread: see socket::clone(). It uses the OS dup() call (or similar) on the handle but creates a new socket object for each thread. Then each one would have it's own lifetime and its own copy of lastErr_.

The immediate problems with this are:

I didn't mention it anywhere or provide an example
I didn't implement it for stream_socket which is probably what you would normally want to split.
It has some overhead and different behavior if you're used to sharing the handle.

That said, with the existing implementation of socket, you're idea is interesting: making lastErr_ thread-local would solve the problem. But would that lock us into a promise that socket objects would have to be thread safe now and forever? My original goal was that this library remain nearly as performant as using the C sockets API directly, at least in the happy path (reads and writes succeeding). So we should consider the options.

On a side note: In an ancient version of this library (pre C++-11), I used the assignment operator to create a copy of the socket object with a non-owning copy of the handle. This way, the socket would still only be closed once by the original owner. The problem was that the owner could disappear and close the handle. If the other threads didn't notice in time, another thread could open a new socket and get the same handle which the OS recycled. Then, later on, the other threads could wake up and start sending data out to the wrong connection! Yeah, I did that.

fpagliughi commented 5 years ago

Argh. I thought I would try the thread_local idea for lastErr_.

First I wrote a unit test to show that the call to socket::last_error() is not thread safe. I share a socket between two threads, force an error in one thread and see that last_error() is affected in the other thread.

Then I went to add thread_local to the lastErr_ declaration... but it appears that you can't have a class member variable be declared thread_local. It would need to be static thread_local. But that certainly isn't right, as it would be shared across all the sockets in a thread.

So I'm not sure it's worth pursuing this further.

fpagliughi commented 5 years ago

BTW, this is in the thread_safe_sock branch.

fpagliughi commented 5 years ago

Perhaps it would be worth considering getting rid of lastErr_ altogether. We could have socket::last_error() just return errno. That would change its behavior in that the value wouldn't be "sticky". Calling it twice in a row would return different results.

I don't really like that idea because in every other use case, lastErr_ is beneficial. But it's worth considering.

The real problem is that the errno concept as a whole just stinks. We could consider breaking with the concept and just having each sockpp method return the error code as a negative number in the case of failure. This is probably how the entire Unix/Posix API should have been coded 40 years ago.

This sounds like a viable solution, but it would entirely change the sockpp API.

snej commented 5 years ago

I also just realized that object instance members can't be thread_local. Sigh!

Another possibility is to create last_read_error and last_write_error properties, but that's a bit ugly.

The way I assumed that we would handle sharing a socket handle across threads would be by making a clone of the socket for other thread

Interesting ... but that does change the semantics a bit, in that both sockets need to be closed for the underlying socket to close. I'll think about it.

snej commented 5 years ago

Other possibilities:

Return a {byte-count, error} pair from the read and write methods. (This gets less awkward in C++17...)
Use C++ iostreams for reading and writing. Then the read and write are separate objects with their own error state.
Both of the above, with the iostreams as an optional wrapper around the primitive methods.

fpagliughi commented 5 years ago

Yes, the problem with dup() is that one handle doesn't detect that the other was closed. So you need a separate signaling mechanism, and there isn't a portable way to do that easily.

I would suspect that the iostream suffers from this same problem. They would probably need separate dup'ed handles otherwise there would be the problem of a double close of the underlying handle.

fpagliughi commented 5 years ago

You can have two socket objects use the same handle if you do it manually. Just create a second socket using the handle() from the first. If you carefully manage the lifetime of the second one, make sure it goes out of scope before the first, and you have one of them reset() before destruction, this would work.

It's a little ugly, but as a pattern for this particular use case, maybe it's workable.

I pushed an example to the _threrad_safesock branch as examples/tcp/tcpechomt.cpp. Have a look.

fpagliughi commented 5 years ago

In adding this sample, I realized I had forgotten to implement socket::shutdown(how). Don't know how I missed that one!

fpagliughi commented 5 years ago

Actually...

I tried the multi-threaded client example with clone'd (dup'd) sockets, and it behaves exactly the same, without the worry of the shared handle! Calling shutdown() on the write handle knocks the duplicated handle off the blocking read.

At least it works on Linux. I can try it on Windows later.

So... I've gone full circle back to my original recommendation: Duplicate the socket and move the duplicate to the other thread.

Have a look at the updated examples/tcp/tcpechomt.cpp

snej commented 5 years ago

Duplicate the socket and move the duplicate to the other thread.

Makes sense, thanks.

I do have some problems adopting this approach, but they relate to a feature I'm adding, not the current version of the library, so I'll bring them up elsewhere.

fpagliughi / sockpp

last_error property is not thread-safe #14