Ping/Pong v2 - Githubissues

Theldus / wsServer

wsServer - a tiny WebSocket server library written in C

https://theldus.github.io/wsServer

GNU General Public License v3.0

427 stars 86 forks source link

Ping/Pong v2 #53

Closed Theldus closed 2 years ago

Theldus commented 2 years ago

Description

In PR #51 an early version of Ping/Pong was introduced by @gloveboxes.

As discussed earlier in issue #50, some modifications would be made for a final version of wsServer PING support. This PR addresses those changes.

This 'v2' changes:

Removal of using snprintf/strtol to send the PING id as a string: the PING id is now sent as a number, always encoded as a big-endian 32-bit signed integer. This brings an interesting advantage:
- RFC 6455 specifies that an unsolicited PONG frame can be sent, as a unidirectional heartbeat: sending a non-fixed-length string makes it difficult to know whether the received PONG was in response to a PING or not. On the other hand, sending a number with known encoding and size makes it easier to know if the PONG received is in response to a PING or not, either by the size (always 4 bytes), or by the content itself.
Use of locks for reading/writing the PING and PONG id, since there may be race conditions between receiving a PONG and sending a PING.
Sending pings to a client or broadcast: the 'client' parameter defines this. In addition, PING and PONG id is now managed by the server itself, which maintains a private copy for each connected client. This also allows the broadcast to send different IDs accordingly to the client counter itself.

Also, this PR adds a man page (doc/man/man3/ws_ping.3) and an example file, located at examples/ping.

@gloveboxes This PR basically brings together everything we talked about earlier and that's what I can imagine for the final version. Since the initial idea was yours, I would like to hear from you: what do you think? is there anything that could be improved/removed/etc or it seems ok to you?

Hope this eases your problem with clients disconnecting while sleeping =). Again, thank you very much for your contribution.

gloveboxes commented 2 years ago

hey @Theldus there are a couple of deadlock issues

If the server is sending messages when the client silently disconnects the server.

When sending, I can see that send doesn't return at line (and I can see that locks have been taken)

https://github.com/Theldus/wsServer/blob/d81ea43c7c650a0e052e1929b8c72d27d42082d8/src/ws.c#L262

I don't 100% understand your locking strategy, but I can see that a number of locks are taken and ping hangs trying to take a mutex at line https://github.com/Theldus/wsServer/blob/d81ea43c7c650a0e052e1929b8c72d27d42082d8/src/ws.c#L661

reading the header https://github.com/Theldus/wsServer/blob/d81ea43c7c650a0e052e1929b8c72d27d42082d8/src/ws.c#L243

I see it's using blocking io - so what i think is happening is ret = send(client->client_sock, p, len, flags); is blocking which is causing deadlocks.

Cheers Dave

Theldus commented 2 years ago

Hey @gloveboxes, thanks for taking a look and testing this PR.

How can I reproduce this? in my tests I had no problems with locks...

I don't 100% understand your locking strategy, but I can see that a number of locks are taken and ping hangs trying to take a mutex at line

My lock 'strategy' boils down to:

mutex - Global lock, used whenever iterating over the client list.
mtx_state - Per-client lock, used to ensure that the client state is atomically read/set throughout the code.
mtx_send - Per-client lock, used to ensure that messages sent to a client will always be serialized, fixes #43.
mtx_ping - Per-client lock, used to prevent simultaneous access to current_ping_id and last_pong_id.

So you're saying the send() function doesn't return? that's interesting...

Ok, reading the (Linux) man-pages:

When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode.

I could do non-blocking sends, but since I want the client to receive the content, I would have to be in some loop trying to send anyway...

What should I do?... I believe send() will not be blocked forever (or is there such possibility?), so at some point all 'send_ping_close()' will be executed and the lock released again... and then new pings can be sent.

In a way this even makes some sense: new pings are only sent when the previous ones manage to be sent... or does that not make much sense?

gloveboxes commented 2 years ago

What should I do?... I believe send() will not be blocked forever (or is there such possibility?), so at some point all 'send_ping_close()' will be executed and the lock released again... and then new pings can be sent.

From some limited searching I did it appears the socket will timeout after 20 minutes. (https://ubuntuforums.org/showthread.php?t=1565651)

But you can override with

new_sock = accept(sock, (struct sockaddr *)&client, (socklen_t *)&len);
setsockopt(new_sock, SOL_SOCKET, SO_SNDTIMEO, &(struct timeval){1, 0}, sizeof(struct timeval));

In this case a 1 second timeout.

I had a hack and this does solve the problem. The socket send timeouts, app continues, ping will be called and then it will call close socket when the pong becomes stale.

Perhaps the default timeout could be set in the struct ws_events evs; struct...

Theldus commented 2 years ago

From some limited searching I did it appears the socket will timeout after 20 minutes. (https://ubuntuforums.org/showthread.php?t=1565651)

I see... far from ideal...

But you can override with
new_sock = accept(sock, (struct sockaddr *)&client, (socklen_t *)&len);
setsockopt(new_sock, SOL_SOCKET, SO_SNDTIMEO, &(struct timeval){1, 0}, sizeof(struct timeval));
In this case a 1 second timeout.

I had a hack and this does solve the problem. The socket send timeouts, app continues, ping will be called and then it will call close socket when the pong becomes stale.

Very good, on a quick search here, SO_SNDTIMEO seems to be compatible with Linux, Windows, FreeBSD and MacOS too, which is exactly what I'm looking for.

Perhaps the default timeout could be set in the struct ws_events evs; struct...

Yes... but ws_events in theory should only store the events, but I really plan on adding a structure to the server itself soon (to get rid of global variables), so that's the least important thing.

The only problem I see is that with this approach, the timeout would apply to all send(). Thinking that the user can send large frames (several MB), send() blocking for a few seconds would be normal and even expected...

Ideally the timeout should only apply to sending PINGs, not everything else... Anyway, being able to choose the timeout is better than nothing =).

Idea: Maybe I can add a parameter in ws_sendframe() (not the _txt,_bin versions) to specify the timeout (in milliseconds, if any, 0 disables) and use it in 'send_all()'.

Also, modify 'send_all()' in such a way that if a timeout is specified, send() will be invoked with MSG_DONTWAIT and in case of EAGAIN/EWOULDBLOCK, a loop (with a small sleep) repeats sending until the time expires.

In this way, I can guarantee the 'timeout' for sending only to PING frames (or from a specific frame).

The 'ws_sendframe_[txt, bin]' functions would still be blocking as usual.

What do you think?

gloveboxes commented 2 years ago

Ideally the timeout should only apply to sending PINGs, not everything else... Anyway, being able to choose the timeout is better than nothing =).

Hi there - timeout should apply to all sends.

I'm using libevent to drive the event driven architecture for my app. Event driven and tasks give the illusion of multithreaded but without the overhead of threads. Event tasks are typically small units of work, but the important thing is they must complete else other events wont have a chance to execute. If a socket send be it a data frame of a ping frame doesn't timeout then an event task doesn't complete and the app hangs.

gloveboxes commented 2 years ago

The only problem I see is that with this approach, the timeout would apply to all send(). Thinking that the user can send large frames (several MB), send() blocking for a few seconds would be normal and even expected...

reading the https://linux.die.net/man/3/setsockopt docs for SO_SNDTIMEO

Sets the timeout value specifying the amount of time that an output function blocks because flow control prevents data from being sent. If a send operation has blocked for this time, it shall return with a partial count or with errno set to [EAGAIN] or [EWOULDBLOCK] if no data is sent. The default for this option is zero, which indicates that a send operation shall not time out. This option stores a timeval structure. Note that not all implementations allow this option to be set.

The important thing here is

amount of time that an output function blocks because flow control prevents data from being sent.

To me this reads that if everything is operating as normal then SO_SNDTIMEO won't timeout long multi megabyte sends. The SO_SNDTIMEO is used when flow control prevents data from being sent.

gloveboxes commented 2 years ago

I had a go at implementing timeout. I pass in a global timeout, I'm really not convinced you need to be more granular than that...

extern int ws_socket(struct ws_events *evs, uint16_t port, int thread_loop, uint32_t timeout_ms);

see https://github.com/gloveboxes/wsServer/blob/6705a591f891e24b3d17cd9e920f829ad95adb37/include/ws.h#L270

and I set the timeout on the new socket

https://github.com/gloveboxes/wsServer/blob/6705a591f891e24b3d17cd9e920f829ad95adb37/src/ws.c#L1541

Cheers Dave

Theldus commented 2 years ago

@gloveboxes

The important thing here is

amount of time that an output function blocks because flow control prevents data from being sent.

To me this reads that if everything is operating as normal then SO_SNDTIMEO won't timeout long multi megabyte sends. The SO_SNDTIMEO is used when flow control prevents data from being sent.

You're right, again =).

I made a small program to test this theory:

I wrote a small server that sends 100 MiB to a connected client
Set a timeout of 1s as suggested by you
Added a latency in my network of varying values, between 100ms to 1100ms.
Up to 1000ms (fake latency) all bytes were sent perfectly, although it took about 1 minute. For 1100ms latency, the send aborted with just under 9 seconds.

If you want to reproduce:

(You can add latency either on localhost or on the network adapter, just remember that on localhost the latency will be for both sending and receiving. Here I will add it to my network card and test from another computer (with Gigabit adapter) ).

# Download server
$ wget https://gist.github.com/Theldus/19dad093630c33a637872b0bb1f74f5f/raw/\
ed382bcb590337b40b450dcb2f248a8ebc0e91cc/server.c

# Build
$ gcc server.c -o server

# Add 900ms of latency on eth1 (remove with: sudo tc qdisc del dev eth1 root)
$ sudo tc qdisc add dev eth1 root netem delay 900ms

# Run server
./server

# On another computer
$ nc <your-server-address> 8080 | tail -c 10 -

#### After some time you should see on server and client, respectively: ####

# Server output
$ time ./server 
Connected!
Sending buffer...
Sent: Success...

real    1m31,500s
user    0m0,011s
sys 0m0,191s

# Client output
$ nc <your-server-address> 8080 | tail -c 10 -
aaaaaaaaab

tc (Netem) is part of the iproute2 package and should be available by default on distributions like Ubuntu.

I think this is the solution then, thank goodness =).

If you do not mind, I can take your commit and add it to my 'pingpong_v2' branch, so I can credit you for the solution. Otherwise I can add it manually, no worries =).