Theldus / wsServer

wsServer - a tiny WebSocket server library written in C
https://theldus.github.io/wsServer
GNU General Public License v3.0
427 stars 85 forks source link

[question] Android crashes #102

Open caiiiycuk opened 1 month ago

caiiiycuk commented 1 month ago

Hi. Thank you for great library, I integrated it in my app that published in Google Play store. User base is near 10_000 active users. The policy of Google regarding crashes is very strict. You must fit into <= 1.09% affected users. In my case crash rate is 1.61%. Most of crashes are "related" to wsServer. It's hard to tell if it really from wsServer or not, because app is native, and maybe something wrong happens in my codebase. Anyway there is a list of crashes from Google:

SIGBUS: [split_config.arm64_v8a.apk!libandroid-ws.so] ws.c - ws_establishconnection

backtrace:
  Unknown
  #01  pc 0x0000000000083ee4  /data/app/~~YHA58jvEPov4riIWo_Ftsg==/zone.dos.app-awJNrztH5GhsJe5gjQgZ7g==/split_config.arm64_v8a.apk!libandroid-ws.so (ws_establishconnection+1672) (BuildId: 737ba6189788c3192968c7121b8a9a57d89e5012)
  #02  pc 0x00000000000fd134  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208)
  #03  pc 0x0000000000096ae4  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)

SIGABRT: [split_config.armeabi_v7a.apk!libandroid-ws.so] ws.c - ws_establishconnection


backtrace:
  #00  pc 0x000000000001a6fa  /system/lib/libc.so (abort+63)
  #01  pc 0x0000000000049821  /system/lib/libc.so (__pthread_internal_find(long)+96)
  #02  pc 0x00000000000498a1  /system/lib/libc.so (pthread_join+24)
  #03  pc 0x0000000000060b5d  /data/app/zone.dos.app-fkSfkSo2r2Se2IOJIYUPIw==/split_config.armeabi_v7a.apk!libandroid-ws.so (ws_establishconnection+1712) (BuildId: de2f661a399174ec821d97f47d25163ef2eb0ba2)
  #04  pc 0x0000000000049557  /system/lib/libc.so (__pthread_start(void*)+22)
  #05  pc 0x000000000001b2c9  /system/lib/libc.so (__start_thread+32)
Theldus commented 1 month ago

Hi @caiiiycuk, Wow, this is really very interesting! I never imagined that wsServer would be used in an Android app published on the Play Store, let alone have 10k active users. That makes me really happy. Could you share the name of this app? I'm really curious.

Unfortunately, I can't extract much information from this stack trace... do you know how to reproduce this issue? My hope is that a build with debug symbols might provide a bit more information, so that's something I suggest to you.

Maybe if I can run the app on my phone and reproduce the issue, I can give you more details... the only thing I can infer from your stack trace is that it seems the crashes are coming from ws_establishconnection().

Other information would also be helpful:

caiiiycuk commented 1 month ago

The name of the app is DOS Browser. This is an WebView that provides acceleration for dosbox/dosbox-x backend of js-dos. wsServer is used as transport for a dosbox/dosbox-x backend. So WebView connects to emulation using WebSocket protocol.

dosbox/dosbox-x is a huge and uses asm, so it can be possible that some UB happens and I see errors on wsServer side. This is a reason why I created it as question but not as bug. On other hand the lifecycle of wsServer is following:

  1. server started
  2. browser connects and asks to start some backend (dosbox/dosbox-x). Is ws_establishconnection happens here? If so, then UB theory is wrong, cause dosbox/dosbox-x is not started at this moment.
  3. backend started
  4. browser and backend speak with each other using websocket
  5. browser sends a close message (everything is stopped after that)

info:

  1. wsServer commit is fbc1547a1a361594c72dd0cf897c937c905032f6
  2. Android 8.1, 9, 10, 13, 14 (many different phones)

Some additonal new info, have following text in errors:

invalid pthread_t 0x8279f970 passed to libc
invalid pthread_t 0x<sanitized> passed to pthread_join
  1. armeabi_v7a, arm64_v8a

Also, I have another question. In my case I always have only one active connection, can I get some performance improve if I remove threading from library (drop pthread)?

Theldus commented 1 month ago

Thank you for your detailed response.

The execution flow is approximately as follows:

ws_socket()
  ws_accept()                 // on a new thread or not
    ws_establishconnection()  // handles client messages, always on a new thread
      do_handshake()
      while (next_complete_frame())
          handle_frame()

The ws_establishconnection() function performs the initial handshake with the client and remains active throughout the connection's duration.

Some additional new information includes the following error messages: invalid pthread_t 0x8279f970 passed to libc invalid pthread_t 0x<sanitized> passed to pthread_join

This is interesting... for some reason, there seems to be an issue with pthread_join(), which only occurs at a specific point in my code, at ws_establishconnection() line 1712.

The call to pthread_join() is a rare code path and should only occur when the ws_close_client() function is invoked on the server side. Do you happen to call it at any point?

The idea is that when ws_close_client() is invoked, the server starts a 'timeout' thread, waiting for a specified time until the client responds with a close message. If this doesn't happen, the server closes the connection abruptly. The join of the threads occurs when the client thread waits for the timeout thread to finish, but again, this should only happen when ws_close_client() is invoked.

Furthermore, based on the information provided, it seems that the thread referenced by pthread_join() is invalid, leading to the reported errors. So, it appears there might be either a memory corruption issue or some other unexpected behavior... but I need to investigate this further.


I will try to create a minimal example for Android (via Termux) and attempt to call ws_close_client() to see if the issue occurs.

What I can suggest is to use a more recent version of the library, such as the current version in the master branch. The latest PR (#96) restructured how clients are handled, potentially avoiding some bugs that may have existed in the client management. Perhaps this has resolved the bug you're encountering, but it's difficult to say without a more detailed investigation.


Also, I have another question. In my case I always have only one active connection, can I get some performance improve if I remove threading from library (drop pthread)?

Dropping pthread support is complicated, as it is used not only for handling multiple clients but also for other things, such as the asynchronous timeout that wsServer implements, as explained earlier.

I genuinely plan to eventually expose a low-level layer of the library, allowing the user to choose what they want to use. Obviously, the low-level layer would not include threads, but this is something that will require some time to develop.

Theldus commented 2 weeks ago

Hi @caiiiycuk, Just to let you know, I haven’t been able to reproduce the issue yet.

I downloaded your app to my phone, and it works without any problems. I also ran a test with a build of wsServer for Android (via Termux), and again, no issues, even when building with ASan and UBSan enabled (for Linux - x64).

I'm really running out of ideas. As I mentioned before, the problem seems to occur here:

/* Wait for timeout thread if necessary. */
if (clse_thrd)
{
    pthread_cond_signal(&client->cnd_state_close);
    pthread_join(client->thrd_tout, NULL);
}

But for this join to happen with an invalid thread, the only possibility I can think of is that clse_thrd is being set incorrectly, either due to an error in my code or yours. It would be great if Android builds supported ASan or something similar to catch illegal memory accesses, etc.

caiiiycuk commented 2 weeks ago

Thank you very much. I also never seen it by myself. Anyway I preparing new release with latest wsServer and also I fixed coupel UB in my code. So I hope it will work well. I will keep you posted.