Open caiiiycuk opened 1 month ago
Hi @caiiiycuk, Wow, this is really very interesting! I never imagined that wsServer would be used in an Android app published on the Play Store, let alone have 10k active users. That makes me really happy. Could you share the name of this app? I'm really curious.
Unfortunately, I can't extract much information from this stack trace... do you know how to reproduce this issue? My hope is that a build with debug symbols might provide a bit more information, so that's something I suggest to you.
Maybe if I can run the app on my phone and reproduce the issue, I can give you more details... the only thing I can infer from your stack trace is that it seems the crashes are coming from ws_establishconnection().
Other information would also be helpful:
The name of the app is DOS Browser. This is an WebView that provides acceleration for dosbox/dosbox-x backend of js-dos. wsServer is used as transport for a dosbox/dosbox-x backend. So WebView connects to emulation using WebSocket protocol.
dosbox/dosbox-x is a huge and uses asm, so it can be possible that some UB happens and I see errors on wsServer side. This is a reason why I created it as question but not as bug. On other hand the lifecycle of wsServer is following:
info:
Some additonal new info, have following text in errors:
invalid pthread_t 0x8279f970 passed to libc
invalid pthread_t 0x<sanitized> passed to pthread_join
Also, I have another question. In my case I always have only one active connection, can I get some performance improve if I remove threading from library (drop pthread)?
Thank you for your detailed response.
The execution flow is approximately as follows:
ws_socket()
ws_accept() // on a new thread or not
ws_establishconnection() // handles client messages, always on a new thread
do_handshake()
while (next_complete_frame())
handle_frame()
The ws_establishconnection()
function performs the initial handshake with the client and remains active throughout the connection's duration.
Some additional new information includes the following error messages:
invalid pthread_t 0x8279f970 passed to libc
invalid pthread_t 0x<sanitized> passed to pthread_join
This is interesting... for some reason, there seems to be an issue with pthread_join()
, which only occurs at a specific point in my code, at ws_establishconnection() line 1712.
The call to pthread_join()
is a rare code path and should only occur when the ws_close_client()
function is invoked on the server side. Do you happen to call it at any point?
The idea is that when ws_close_client()
is invoked, the server starts a 'timeout' thread, waiting for a specified time until the client responds with a close message. If this doesn't happen, the server closes the connection abruptly. The join of the threads occurs when the client thread waits for the timeout thread to finish, but again, this should only happen when ws_close_client()
is invoked.
Furthermore, based on the information provided, it seems that the thread referenced by pthread_join()
is invalid, leading to the reported errors. So, it appears there might be either a memory corruption issue or some other unexpected behavior... but I need to investigate this further.
I will try to create a minimal example for Android (via Termux) and attempt to call ws_close_client()
to see if the issue occurs.
What I can suggest is to use a more recent version of the library, such as the current version in the master branch. The latest PR (#96) restructured how clients are handled, potentially avoiding some bugs that may have existed in the client management. Perhaps this has resolved the bug you're encountering, but it's difficult to say without a more detailed investigation.
Also, I have another question. In my case I always have only one active connection, can I get some performance improve if I remove threading from library (drop pthread)?
Dropping pthread support is complicated, as it is used not only for handling multiple clients but also for other things, such as the asynchronous timeout that wsServer implements, as explained earlier.
I genuinely plan to eventually expose a low-level layer of the library, allowing the user to choose what they want to use. Obviously, the low-level layer would not include threads, but this is something that will require some time to develop.
Hi @caiiiycuk, Just to let you know, I haven’t been able to reproduce the issue yet.
I downloaded your app to my phone, and it works without any problems. I also ran a test with a build of wsServer
for Android (via Termux), and again, no issues, even when building with ASan and UBSan enabled (for Linux - x64).
I'm really running out of ideas. As I mentioned before, the problem seems to occur here:
/* Wait for timeout thread if necessary. */
if (clse_thrd)
{
pthread_cond_signal(&client->cnd_state_close);
pthread_join(client->thrd_tout, NULL);
}
But for this join to happen with an invalid thread, the only possibility I can think of is that clse_thrd
is being set incorrectly, either due to an error in my code or yours. It would be great if Android builds supported ASan or something similar to catch illegal memory accesses, etc.
Thank you very much. I also never seen it by myself. Anyway I preparing new release with latest wsServer and also I fixed coupel UB in my code. So I hope it will work well. I will keep you posted.
Hi. Thank you for great library, I integrated it in my app that published in Google Play store. User base is near 10_000 active users. The policy of Google regarding crashes is very strict. You must fit into <= 1.09% affected users. In my case crash rate is 1.61%. Most of crashes are "related" to wsServer. It's hard to tell if it really from wsServer or not, because app is native, and maybe something wrong happens in my codebase. Anyway there is a list of crashes from Google:
SIGBUS: [split_config.arm64_v8a.apk!libandroid-ws.so] ws.c - ws_establishconnection
SIGABRT: [split_config.armeabi_v7a.apk!libandroid-ws.so] ws.c - ws_establishconnection