Theldus / wsServer

wsServer - a tiny WebSocket server library written in C
https://theldus.github.io/wsServer
GNU General Public License v3.0
427 stars 86 forks source link

Ping/Pong v2 #53

Closed Theldus closed 2 years ago

Theldus commented 2 years ago

Description

In PR #51 an early version of Ping/Pong was introduced by @gloveboxes.

As discussed earlier in issue #50, some modifications would be made for a final version of wsServer PING support. This PR addresses those changes.

This 'v2' changes:

Also, this PR adds a man page (doc/man/man3/ws_ping.3) and an example file, located at examples/ping.


@gloveboxes This PR basically brings together everything we talked about earlier and that's what I can imagine for the final version. Since the initial idea was yours, I would like to hear from you: what do you think? is there anything that could be improved/removed/etc or it seems ok to you?

Hope this eases your problem with clients disconnecting while sleeping =). Again, thank you very much for your contribution.

gloveboxes commented 2 years ago

hey @Theldus there are a couple of deadlock issues

If the server is sending messages when the client silently disconnects the server.

When sending, I can see that send doesn't return at line (and I can see that locks have been taken)

https://github.com/Theldus/wsServer/blob/d81ea43c7c650a0e052e1929b8c72d27d42082d8/src/ws.c#L262

I don't 100% understand your locking strategy, but I can see that a number of locks are taken and ping hangs trying to take a mutex at line https://github.com/Theldus/wsServer/blob/d81ea43c7c650a0e052e1929b8c72d27d42082d8/src/ws.c#L661

reading the header https://github.com/Theldus/wsServer/blob/d81ea43c7c650a0e052e1929b8c72d27d42082d8/src/ws.c#L243

I see it's using blocking io - so what i think is happening is ret = send(client->client_sock, p, len, flags); is blocking which is causing deadlocks.

Cheers Dave

Theldus commented 2 years ago

Hey @gloveboxes, thanks for taking a look and testing this PR.

How can I reproduce this? in my tests I had no problems with locks...

I don't 100% understand your locking strategy, but I can see that a number of locks are taken and ping hangs trying to take a mutex at line

My lock 'strategy' boils down to:

So you're saying the send() function doesn't return? that's interesting...

Ok, reading the (Linux) man-pages:

When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in nonblocking I/O mode.

I could do non-blocking sends, but since I want the client to receive the content, I would have to be in some loop trying to send anyway...

What should I do?... I believe send() will not be blocked forever (or is there such possibility?), so at some point all 'send_ping_close()' will be executed and the lock released again... and then new pings can be sent.

In a way this even makes some sense: new pings are only sent when the previous ones manage to be sent... or does that not make much sense?

gloveboxes commented 2 years ago

What should I do?... I believe send() will not be blocked forever (or is there such possibility?), so at some point all 'send_ping_close()' will be executed and the lock released again... and then new pings can be sent.

From some limited searching I did it appears the socket will timeout after 20 minutes. (https://ubuntuforums.org/showthread.php?t=1565651)

But you can override with

new_sock = accept(sock, (struct sockaddr *)&client, (socklen_t *)&len);
setsockopt(new_sock, SOL_SOCKET, SO_SNDTIMEO, &(struct timeval){1, 0}, sizeof(struct timeval));

In this case a 1 second timeout.

I had a hack and this does solve the problem. The socket send timeouts, app continues, ping will be called and then it will call close socket when the pong becomes stale.

Perhaps the default timeout could be set in the struct ws_events evs; struct...

Theldus commented 2 years ago

From some limited searching I did it appears the socket will timeout after 20 minutes. (https://ubuntuforums.org/showthread.php?t=1565651)

I see... far from ideal...

But you can override with

new_sock = accept(sock, (struct sockaddr *)&client, (socklen_t *)&len);
setsockopt(new_sock, SOL_SOCKET, SO_SNDTIMEO, &(struct timeval){1, 0}, sizeof(struct timeval));

In this case a 1 second timeout.

I had a hack and this does solve the problem. The socket send timeouts, app continues, ping will be called and then it will call close socket when the pong becomes stale.

Very good, on a quick search here, SO_SNDTIMEO seems to be compatible with Linux, Windows, FreeBSD and MacOS too, which is exactly what I'm looking for.

Perhaps the default timeout could be set in the struct ws_events evs; struct...

Yes... but ws_events in theory should only store the events, but I really plan on adding a structure to the server itself soon (to get rid of global variables), so that's the least important thing.

The only problem I see is that with this approach, the timeout would apply to all send(). Thinking that the user can send large frames (several MB), send() blocking for a few seconds would be normal and even expected...

Ideally the timeout should only apply to sending PINGs, not everything else... Anyway, being able to choose the timeout is better than nothing =).


Idea: Maybe I can add a parameter in ws_sendframe() (not the _txt,_bin versions) to specify the timeout (in milliseconds, if any, 0 disables) and use it in 'send_all()'.

Also, modify 'send_all()' in such a way that if a timeout is specified, send() will be invoked with MSG_DONTWAIT and in case of EAGAIN/EWOULDBLOCK, a loop (with a small sleep) repeats sending until the time expires.

In this way, I can guarantee the 'timeout' for sending only to PING frames (or from a specific frame).

The 'ws_sendframe_[txt, bin]' functions would still be blocking as usual.

What do you think?

gloveboxes commented 2 years ago

Ideally the timeout should only apply to sending PINGs, not everything else... Anyway, being able to choose the timeout is better than nothing =).

Hi there - timeout should apply to all sends.

I'm using libevent to drive the event driven architecture for my app. Event driven and tasks give the illusion of multithreaded but without the overhead of threads. Event tasks are typically small units of work, but the important thing is they must complete else other events wont have a chance to execute. If a socket send be it a data frame of a ping frame doesn't timeout then an event task doesn't complete and the app hangs.

gloveboxes commented 2 years ago

The only problem I see is that with this approach, the timeout would apply to all send(). Thinking that the user can send large frames (several MB), send() blocking for a few seconds would be normal and even expected...

reading the https://linux.die.net/man/3/setsockopt docs for SO_SNDTIMEO

Sets the timeout value specifying the amount of time that an output function blocks because flow control prevents data from being sent. If a send operation has blocked for this time, it shall return with a partial count or with errno set to [EAGAIN] or [EWOULDBLOCK] if no data is sent. The default for this option is zero, which indicates that a send operation shall not time out. This option stores a timeval structure. Note that not all implementations allow this option to be set.

The important thing here is

amount of time that an output function blocks because flow control prevents data from being sent.

To me this reads that if everything is operating as normal then SO_SNDTIMEO won't timeout long multi megabyte sends. The SO_SNDTIMEO is used when flow control prevents data from being sent.

gloveboxes commented 2 years ago

I had a go at implementing timeout. I pass in a global timeout, I'm really not convinced you need to be more granular than that...

extern int ws_socket(struct ws_events *evs, uint16_t port, int thread_loop, uint32_t timeout_ms);

see https://github.com/gloveboxes/wsServer/blob/6705a591f891e24b3d17cd9e920f829ad95adb37/include/ws.h#L270

and I set the timeout on the new socket

https://github.com/gloveboxes/wsServer/blob/6705a591f891e24b3d17cd9e920f829ad95adb37/src/ws.c#L1541

Cheers Dave

Theldus commented 2 years ago

@gloveboxes

The important thing here is

amount of time that an output function blocks because flow control prevents data from being sent.

To me this reads that if everything is operating as normal then SO_SNDTIMEO won't timeout long multi megabyte sends. The SO_SNDTIMEO is used when flow control prevents data from being sent.

You're right, again =).

I made a small program to test this theory:

If you want to reproduce:

(You can add latency either on localhost or on the network adapter, just remember that on localhost the latency will be for both sending and receiving. Here I will add it to my network card and test from another computer (with Gigabit adapter) ).

# Download server
$ wget https://gist.github.com/Theldus/19dad093630c33a637872b0bb1f74f5f/raw/\
ed382bcb590337b40b450dcb2f248a8ebc0e91cc/server.c

# Build
$ gcc server.c -o server

# Add 900ms of latency on eth1 (remove with: sudo tc qdisc del dev eth1 root)
$ sudo tc qdisc add dev eth1 root netem delay 900ms

# Run server
./server

# On another computer
$ nc <your-server-address> 8080 | tail -c 10 -

#### After some time you should see on server and client, respectively: ####

# Server output
$ time ./server 
Connected!
Sending buffer...
Sent: Success...

real    1m31,500s
user    0m0,011s
sys 0m0,191s

# Client output
$ nc <your-server-address> 8080 | tail -c 10 -
aaaaaaaaab

tc (Netem) is part of the iproute2 package and should be available by default on distributions like Ubuntu.


I think this is the solution then, thank goodness =).

If you do not mind, I can take your commit and add it to my 'pingpong_v2' branch, so I can credit you for the solution. Otherwise I can add it manually, no worries =).