EionRobb / purple-discord

A libpurple/Pidgin plugin for Discord
GNU General Public License v3.0
382 stars 43 forks source link

random crash (SIGABRT) in ssl_nss_read called from discord_socket_got_data #339

Closed pabs3 closed 3 years ago

pabs3 commented 3 years ago

I got a random crash in pidgin due to purple-discord. I was using commit 001dc28 and ssl_nss_read raised a SIGABRT when called from discord_socket_got_data. I've attached the full backtrace and included the summary below.

The core dump for this crash is available in case you need more information but it will be automatically deleted after one week unless the time is extended. If the information submitted in this bug is not useful, please close this bug report.

Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7ffb881fc6c0 (LWP 211771))]
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffb8b557537 in __GI_abort () at abort.c:79
#2  0x000055a092cde7cc in sighandler (sig=<optimized out>) at ././pidgin/gtkmain.c:182
#3  0x00007ffb8b70b140 in <signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0
#4  0x00007ffb869a073a in ssl_nss_read (gsc=0x55a0950bc990, data=0x55a0957842e8, len=1) at ././libpurple/plugins/ssl/ssl-nss.c:535
#5  0x00007ffb851c8455 in discord_socket_got_data (userdata=0x55a095784270, conn=0x55a0950bc990, cond=<optimized out>) at libdiscord.c:4580
#6  0x000055a092cc40e2 in pidgin_io_invoke (source=<optimized out>, condition=<optimized out>, data=0x55a095e6dbb0) at ././pidgin/gtkeventloop.c:73
#7  0x00007ffb8b9f8d6f in g_main_dispatch (context=0x55a09464f050) at ../../../glib/gmain.c:3325
#8  g_main_context_dispatch (context=0x55a09464f050) at ../../../glib/gmain.c:4043
#9  0x00007ffb8b9f9118 in g_main_context_iterate (context=0x55a09464f050, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4119
#10 0x00007ffb8b9f940b in g_main_loop_run (loop=loop@entry=0x55a095784990) at ../../../glib/gmain.c:4317
#11 0x00007ffb8c0c9b2a in IA__gtk_main () at ../../../../gtk/gtkmain.c:1270
#12 0x000055a092c87d81 in main (argc=<optimized out>, argv=<optimized out>) at ././pidgin/gtkmain.c:947
EionRobb commented 3 years ago

Don't suppose you happen to have the --debug output from just before the crash?

pabs3 commented 3 years ago

Unfortunately I didn't have the debug window open at the time, would any of that info be available in the core dump?

-- bye, pabs

https://bonedaddy.net/pabs3/

pabs3 commented 3 years ago

I got another identical crash. Now I have pidgin --debug running under gdb, will let you know when the crash occurs next.

EionRobb commented 3 years ago

Awesome, thanks. Keen to get to the bottom of it, but struggling for answers at the moment.

EionRobb commented 3 years ago

Just had a crash that looks similar to yours, after running for a few days without any crash, using current master

(14:50:14) discord: got frame data: {"t":null,"s":null,"op":7,"d":null}
(14:50:14) dnsquery: Performing DNS lookup for gateway.discord.gg

Program received signal SIGSEGV, Segmentation fault.
0x089b2a7a in ?? () from C:\Program Files (x86)\Pidgin\plugins\ssl-nss.dll
(gdb) bt
#0  0x089b2a7a in ?? () from C:\Program Files (x86)\Pidgin\plugins\ssl-nss.dll
#1  0x687cfe33 in discord_socket_got_data (userdata=0x9e6a038, conn=0x50488b98, cond=PURPLE_INPUT_READ)
    at libdiscord.c:4702
#2  0x62e445bd in ?? () from C:\Program Files (x86)\Pidgin\pidgin.dll
#3  0x685eb167 in ?? () from C:\Program Files (x86)\Pidgin\Gtk\bin\libglib-2.0-0.dll
#4  0x685eb90d in ?? () from C:\Program Files (x86)\Pidgin\Gtk\bin\libglib-2.0-0.dll
#5  0x685ebd9d in ?? () from C:\Program Files (x86)\Pidgin\Gtk\bin\libglib-2.0-0.dll
#6  0x61854260 in ?? () from C:\Program Files (x86)\Pidgin\Gtk\bin\libgtk-win32-2.0-0.dll
#7  0x62e6360d in ?? () from C:\Program Files (x86)\Pidgin\pidgin.dll
#8  0x00e128b1 in ?? ()
#9  0x01d9b148 in ?? ()
#10 0x72676f72 in ?? ()
#11 0x46206d61 in ?? ()
#12 0x00000000 in ?? ()
(gdb) print *conn
$3 = {host = 0x66726550 <error: Cannot access memory at address 0x66726550>, port = 1768780399,
  connect_cb_data = 0x4420676e, connect_cb = 0x6c20534e, error_cb = 0x756b6f6f, recv_cb_data = 0x6f662070,
  recv_cb = 0x61672072, fd = 1635214708, inpa = 1768173177, connect_data = 0x726f6373, private_data = 0x67672e64,
  verifier = 0x62d8000a}

It looks like the conn has been free'd and a new websocket has been created without the eventloop for the old one being cleared?

(gdb) print ya->websocket
$5 = (PurpleSslConnection *) 0x50488968
(gdb) print conn
$6 = (PurpleSslConnection *) 0x50488b98
(gdb)

Will be keen to find out if you've got a similar --debug message if it's the same crash

pabs3 commented 3 years ago

I just got another crash and it looks the same as yours with --debug, unfortunately gdb exited because I forgot to drop the -batch option so I couldn't check the variables you mention. I am going to update the Debian package on my system to the latest version in master with the commit mitigating this and run pidgin in gdb again without the -batch option this time.

(21:59:23) discord: got frame data: {"t":null,"s":null,"op":11,"d":null}
(22:00:04) discord: sending frame: {"op":1,"d":106}
(22:00:04) discord: got frame data: {"t":null,"s":null,"op":11,"d":null}
(22:00:45) discord: sending frame: {"op":1,"d":106}
(22:00:45) discord: got frame data: {"t":null,"s":null,"op":11,"d":null}
(22:01:26) discord: sending frame: {"op":1,"d":106}
(22:01:27) discord: got frame data: {"t":null,"s":null,"op":11,"d":null}
(22:02:08) discord: sending frame: {"op":1,"d":106}
(22:02:08) discord: got frame data: {"t":null,"s":null,"op":11,"d":null}
(22:02:49) discord: sending frame: {"op":1,"d":106}
(22:02:49) discord: got frame data: {"t":null,"s":null,"op":11,"d":null}
(22:03:15) discord: got frame data: {"t":null,"s":null,"op":7,"d":null}
(22:03:15) dnsquery: Performing DNS lookup for gateway.discord.gg

Thread 1 "pidgin" received signal SIGSEGV, Segmentation fault.
pabs3 commented 3 years ago

Running with the new version, I haven't seen any crashes yet.

EionRobb commented 3 years ago

Running with the new version, I haven't seen any crashes yet.

fingers crossed! :) It was only once-in-a-blue-moon for me - seemed to be with a slightly unstable wifi connection that triggered it

pabs3 commented 3 years ago

Hmm, I'll re-enable the thing that makes my WiFi unstable and see what happens :)

-- bye, pabs

https://bonedaddy.net/pabs3/

pabs3 commented 3 years ago

I think we can call this one fixed or worked around, thanks!

I didn't get a crash during an unstable connection event and here is the --debug log:

(07:50:28) discord: sending frame: {"op":1,"d":8676}
(07:51:01) discord: got errno 104, read_len -1 from websocket thread
(07:51:01) dnsquery: Performing DNS lookup for gateway.discord.gg
(07:51:01) dns: Wait for DNS child 902732 failed: No child processes
[Detaching after fork from child process 905512]
(07:51:01) dns: Created new DNS child 905512, there are now 1 children.                                                                                                                  
(07:51:01) dns: Successfully sent DNS request to child 905512
(07:51:01) dns: Got response for 'gateway.discord.gg'
(07:51:01) dnsquery: IP resolved for gateway.discord.gg
(07:51:01) proxy: Attempting connection to 162.159.134.234
(07:51:01) proxy: Connecting to gateway.discord.gg:443 with no proxy
(07:51:01) proxy: Connection in progress
(07:51:02) proxy: Connecting to gateway.discord.gg:443.
(07:51:02) proxy: Connected to gateway.discord.gg:443.
(07:51:02) nss: SSL version 3.3 using 128-bit AES-GCM with 128-bit AEAD MAC
Server Auth: 256-bit ECDSA, Key Exchange: 255-bit ECDHE, Compression: NULL
Cipher Suite Name: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
(07:51:02) nss: subject=CN=discord.gg,O="Cloudflare, Inc.",L=San Francisco,ST=CA,C=US issuer=CN=Cloudflare Inc ECC CA-3,O="Cloudflare, Inc.",C=US
(07:51:02) nss: subject=CN=Cloudflare Inc ECC CA-3,O="Cloudflare, Inc.",C=US issuer=CN=Baltimore CyberTrust Root,OU=CyberTrust,O=Baltimore,C=IE
(07:51:02) nss: partial certificate chain
(07:51:02) certificate/x509/tls_cached: Starting verify for gateway.discord.gg
(07:51:02) certificate/x509/tls_cached: Checking for cached cert...
(07:51:02) certificate/x509/tls_cached: ...Found cached cert
(07:51:02) nss/x509: Loading certificate from /home/pabs/.purple/certificates/x509/tls_peers/gateway.discord.gg
(07:51:02) certificate/x509/tls_cached: Peer cert matched cached
(07:51:02) nss/x509: Exporting certificate to /home/pabs/.purple/certificates/x509/tls_peers/gateway.discord.gg
(07:51:02) util: Writing file /home/pabs/.purple/certificates/x509/tls_peers/gateway.discord.gg
(07:51:02) nss: Trusting CN=discord.gg,O="Cloudflare, Inc.",L=San Francisco,ST=CA,C=US
(07:51:02) certificate: Successfully verified certificate for gateway.discord.gg
(07:51:02) discord: got frame data: {"t":null,"s":null,"op":10,"d":{"heartbeat_interval":41250,"_trace":["[\"gateway-prd-main-ktx2\",{\"micros\":0.0}]"]}}
(07:51:02) discord: sending frame: {"op":6,"d":{"token":"<removed>","session_id":"<removed>","seq":8676}}
(07:51:02) GLib: Source ID 85451 was not found when attempting to remove it
(07:51:02) discord: got frame data: {"t":"RESUMED","s":8677,"op":0,"d":{"_trace":["[\"gateway-prd-main-ktx2\",{\"micros\":3117,\"calls\":[\"discord-sessions-green-prd-2-99\",{\"micros\":16}]}]"]}}
dns[905512]: nobody needs me... =(
(07:51:43) discord: sending frame: {"op":1,"d":8677}
(07:51:43) discord: got frame data: {"t":null,"s":null,"op":11,"d":null}
pabs3 commented 3 years ago

I bit later I got a websocket disconnection event:

(08:00:35) discord: websocket closed
(08:00:35) discord: error code 1000
(08:00:35) dnsquery: Performing DNS lookup for gateway.discord.gg
(08:00:35) dns: Wait for DNS child 906055 failed: No child processes
[Detaching after fork from child process 906869]
(08:00:35) dns: Created new DNS child 906869, there are now 1 children.
(08:00:35) dns: Successfully sent DNS request to child 906869
(08:00:35) dns: Got response for 'gateway.discord.gg'
(08:00:35) dnsquery: IP resolved for gateway.discord.gg
(08:00:35) proxy: Attempting connection to 162.159.134.234
(08:00:35) proxy: Connecting to gateway.discord.gg:443 with no proxy
(08:00:35) proxy: Connection in progress
(08:00:35) proxy: Connecting to gateway.discord.gg:443.
(08:00:35) proxy: Connected to gateway.discord.gg:443.
(08:00:35) nss: SSL version 3.3 using 128-bit AES-GCM with 128-bit AEAD MAC
Server Auth: 256-bit ECDSA, Key Exchange: 255-bit ECDHE, Compression: NULL
Cipher Suite Name: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
(08:00:35) nss: subject=CN=discord.gg,O="Cloudflare, Inc.",L=San Francisco,ST=CA,C=US issuer=CN=Cloudflare Inc ECC CA-3,O="Cloudflare, Inc.",C=US
(08:00:35) nss: subject=CN=Cloudflare Inc ECC CA-3,O="Cloudflare, Inc.",C=US issuer=CN=Baltimore CyberTrust Root,OU=CyberTrust,O=Baltimore,C=IE
(08:00:35) nss: partial certificate chain
(08:00:35) certificate/x509/tls_cached: Starting verify for gateway.discord.gg
(08:00:35) certificate/x509/tls_cached: Checking for cached cert...
(08:00:35) certificate/x509/tls_cached: ...Found cached cert
(08:00:35) nss/x509: Loading certificate from /home/pabs/.purple/certificates/x509/tls_peers/gateway.discord.gg
(08:00:35) certificate/x509/tls_cached: Peer cert matched cached
(08:00:35) nss/x509: Exporting certificate to /home/pabs/.purple/certificates/x509/tls_peers/gateway.discord.gg
(08:00:35) util: Writing file /home/pabs/.purple/certificates/x509/tls_peers/gateway.discord.gg
(08:00:36) nss: Trusting CN=discord.gg,O="Cloudflare, Inc.",L=San Francisco,ST=CA,C=US
(08:00:36) certificate: Successfully verified certificate for gateway.discord.gg
(08:00:36) discord: got frame data: {"t":null,"s":null,"op":10,"d":{"heartbeat_interval":41250,"_trace":["[\"gateway-prd-main-52rx\",{\"micros\":0.0}]"]}}
(08:00:36) discord: sending frame: {"op":6,"d":{"token":"<removed>","session_id":"<removed>","seq":8677}}
(08:00:36) GLib: Source ID 85519 was not found when attempting to remove it
(08:00:37) discord: got frame data: {"t":null,"s":null,"op":9,"d":false}
(08:00:37) discord: sending frame: {"op":2,"d":{"token":"<removed>","capabilities":61,"properties":{"os":"Windows","browser":"Chrome","device":"","browser_user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.116 Safari/537.36","browser_version":"51.0.2704.103","os_version":"10","referrer":"https://discord.com/channels/@me","referring_domain":"discord.com","referrer_current":"","referring_domain_current":"","release_channel":"stable","client_build_number":83364,"client_event_source":null},"presence":{"status":"online","since":0,"activities":[],"afk":false},"compress":false,"client_state":{"guild_hashes":{},"highest_last_message_id":"0","read_state_version":0,"user_guild_settings_version":-1}}}
(08:00:37) discord: got frame data: {"t":"READY","s":1,"op":0,"d":{"v":6,"users": <removed>
EionRobb commented 3 years ago

Nice! :)

pabs3 commented 3 years ago

Bad news, I got another crash with commit 414b63b, this was not near any sort of WiFi issue AFAICT. I still have gdb open if you want any more information. It seems to be DNS related, I'm using the unbound recursive DNS resolver btw.

(07:52:48) discord: sending frame: {"op":1,"d":132}
(07:52:48) discord: got frame data: {"t":null,"s":null,"op":11,"d":null}
(07:53:14) discord: got frame data: {"t":null,"s":null,"op":7,"d":null}
(07:53:14) dnsquery: Performing DNS lookup for gateway.discord.gg

Thread 1 "pidgin" received signal SIGSEGV, Segmentation fault.
0x00007f53443d273a in ssl_nss_read (gsc=0x5603f3baa9a0, data=0x5603f3baa528, len=1) at ././libpurple/plugins/ssl/ssl-nss.c:535
535     ././libpurple/plugins/ssl/ssl-nss.c: No such file or directory.
#0  0x00007f53443d273a in ssl_nss_read (gsc=0x5603f3baa9a0, data=0x5603f3baa528, len=1) at ././libpurple/plugins/ssl/ssl-nss.c:535
#1  0x00007f5342bf9705 in discord_socket_got_data (userdata=0x5603f3baa4b0, conn=0x5603f3baa9a0, cond=<optimized out>) at libdiscord.c:4870
#2  0x00005603f23830e2 in pidgin_io_invoke (source=<optimized out>, condition=<optimized out>, data=0x5603f3225d50) at ././pidgin/gtkeventloop.c:73
#3  0x00007f534e43175f in g_main_dispatch (context=0x5603f2aedfe0) at ../../../glib/gmain.c:3337
#4  g_main_context_dispatch (context=0x5603f2aedfe0) at ../../../glib/gmain.c:4055
#5  0x00007f534e431b08 in g_main_context_iterate (context=0x5603f2aedfe0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4131
#6  0x00007f534e431dfb in g_main_loop_run (loop=loop@entry=0x5603f3baa1b0) at ../../../glib/gmain.c:4329
#7  0x00007f534eb07b2a in IA__gtk_main () at ../../../../gtk/gtkmain.c:1270
#8  0x00005603f2346d81 in main (argc=<optimized out>, argv=<optimized out>) at ././pidgin/gtkmain.c:947
(gdb) frame 1
#1  0x00007f5342bf9705 in discord_socket_got_data (userdata=0x5603f3baa4b0, conn=0x5603f3baa9a0, cond=<optimized out>) at libdiscord.c:4870
4870    libdiscord.c: No such file or directory.
(gdb) print ya->websocket
$1 = (PurpleSslConnection *) 0x5603f3d62f90
(gdb) print conn
$2 = (PurpleSslConnection *) 0x5603f3baa9a0
(gdb) print *(ya->websocket)
$3 = {host = 0x5603f322d520 "gateway.discord.gg", port = 443, connect_cb_data = 0x5603f3baa4b0, connect_cb = 0x7f5342be9420 <discord_socket_connected>, error_cb = 0x7f5342beb430 <discord_socket_failed>, recv_cb_data = 0x0, recv_cb = 0x0, fd = -1, inpa = 0, connect_data = 0x5603f3daad60, private_data = 0x0, verifier = 0x7f534e3d3020 <x509_tls_cached>}
(gdb) print *conn
$4 = {host = 0x5603f3c493a0 "gateway.discord.gg", port = 443, connect_cb_data = 0x7f534e323100 <connection_host_resolved>, connect_cb = 0x5603f3daad60, error_cb = 0x1755, recv_cb_data = 0x5603f407f8e0, recv_cb = 0x0, fd = 724249447, inpa = 724249387, connect_data = 0xfd4d4d498000a67, private_data = 0x21, verifier = 0x0}
EionRobb commented 3 years ago

Oh no :(

Can you print out the DiscordAccount contents?

EionRobb commented 3 years ago

Wait a sec.... ya->websocket and conn aren't matching? The code should have bailed out before then?

pabs3 commented 3 years ago
(gdb) p *ya
$6 = {account = 0x5603f407f8e0, pc = 0x5603f3baa330, cookie_table = 0x5603f3688de0 = {[0x5603f3dc8b60 "__dcfduid"] = 0x5603f3ef8780}, session_token = 0x0, channel = 0x0, self_user_id = <removed>, self_username = 0x5603f3bc4b60 <removed>, last_message_id = <removed>, last_load_last_message_id = <removed>, token = 0x5603f3baa440 <removed>, session_id = 0x5603f3bc33e0 <removed>, mfa_ticket = 0x0, ack_token = 0x5603f3da50c0 <removed>, websocket = 0x5603f3d62f90, websocket_header_received = 0, sync_complete = 0, packet_code = 0 '\000', frame = 0x0, frame_len = 0, frame_len_progress = 11, seq = 132, heartbeat_timeout = 4822, one_to_ones = 0x5603f3688ea0 = {[0x5603f3c9b240 <removed>] = 0x5603f3c9b220, [0x5603f3c9a2b0 <removed>] = 0x5603f3c9a290, [0x5603f3c9a720 <removed>] = 0x5603f3c9a700, [0x5603f3c9aa10 <removed>] = 0x5603f3c9a9f0, [0x5603f3c9a370 <removed>] = 0x5603f3c9a350, [0x5603f3c9ae80 <removed>] = 0x5603f3c9ae60, [0x5603f3c9b3c0 <removed>] = 0x5603f3c9b3a0, [0x5603f3c9a1f0 <removed>] = 0x5603f3c9a1d0, [0x5603f3c9b300 <removed>] = 0x5603f3c9b2e0, [0x5603f3c9b180 <removed>] = 0x5603f3c9b160, [0x5603f3c9b000 <removed>] = 0x5603f3c9afe0, [0x5603f3c9a130 <removed>] = 0x5603f3c9a110, [0x5603f3c9a660 <removed>] = 0x5603f3c9a640, [0x5603f3c9b0c0 <removed>] = 0x5603f3c9b0a0, [0x5603f3c99a10 <removed>] = 0x5603f3c999f0, [0x5603f3c9af40 <removed>] = 0x5603f3c9af20}, one_to_ones_rev = 0x5603f3688f00 = {[0x5603f3c9b340 <removed>] = 0x5603f3c9b320, [0x5603f3c9a2f0 <removed>] = 0x5603f3c9a2d0, [0x5603f3c9af80 <removed>] = 0x5603f3c9af60, [0x5603f3c9aec0 <removed>] = 0x5603f3c9aea0, [0x5603f3c99b80 <removed>] = 0x5603f3c99b60, [0x5603f3c9aaa0 <removed>] = 0x5603f3c9aa80, [0x5603f3c9b040 <removed>] = 0x5603f3c9b020, [0x5603f3c9b100 <removed>] = 0x5603f3c9b0e0, [0x5603f3c9a170 <removed>] = 0x5603f3c9a150, [0x5603f3c9b1c0 <removed>] = 0x5603f3c9b1a0, [0x5603f3c9a760 <removed>] = 0x5603f3c9a740, [0x5603f3c9a3b0 <removed>] = 0x5603f3c9a390, [0x5603f3c9a6a0 <removed>] = 0x5603f3c9a680, [0x5603f3c9abc0 <removed>] = 0x5603f3c9aba0, [0x5603f3c9b280 <removed>] = 0x5603f3c9b260, [0x5603f3c9a230 <removed>] = 0x5603f3c9a210}, last_message_id_dm = 0x5603f3688f60 = {[0x5603f3c9b2c0 <removed>] = 0x5603f3c9b2a0, [0x5603f3c9a330 <removed>] = 0x5603f3c9a310, [0x5603f3c9a7a0 <removed>] = 0x5603f3c9a780, [0x5603f3c9ad20 <removed>] = 0x5603f3c9ad00, [0x5603f3c9a3f0 <removed>] = 0x5603f3c9a3d0, [0x5603f3c9af00 <removed>] = 0x5603f3c9aee0, [0x5603f3c9ab70 <removed>] = 0x5603f3c9ab50, [0x5603f3c9a270 <removed>] = 0x5603f3c9a250, [0x5603f3c9b380 <removed>] = 0x5603f3c9b360, [0x5603f3c9b200 <removed>] = 0x5603f3c9b1e0, [0x5603f3c9b080 <removed>] = 0x5603f3c9b060, [0x5603f3c9a1b0 <removed>] = 0x5603f3c9a190, [0x5603f3c9a6e0 <removed>] = 0x5603f3c9a6c0, [0x5603f3c9b140 <removed>] = 0x5603f3c9b120, [0x5603f3c99c60 <removed>] = 0x5603f3c99c40, [0x5603f3c9afc0 <removed>] = 0x5603f3c9afa0}, sent_message_ids = 0x5603f368bc00, result_callbacks = 0x5603f368bc60, received_message_queue = 0x5603f3086b20, new_users = 0x5603f368bcc0 = {[0x5603f3ca07c0] = 0x5603f3ca07c0, [0x5603f3dc24b0] = 0x5603f3dc24b0, [0x5603f3c8a7b0] = 0x5603f3c8a7b0, [0x5603f3c99190] = 0x5603f3c99190, [0x5603f3c8ebb0] = 0x5603f3c8ebb0, [0x5603f3c99660] = 0x5603f3c99660, [0x5603f3d7d450] = 0x5603f3d7d450, [0x5603f3c98bd0] = 0x5603f3c98bd0, [0x5603f3c99560] = 0x5603f3c99560, [0x5603f3dfd870] = 0x5603f3dfd870, [0x5603f3c98af0] = 0x5603f3c98af0, [0x5603f3c8c5b0] = 0x5603f3c8c5b0, [0x5603f3c99860] = 0x5603f3c99860, [0x5603f3c98c80] = 0x5603f3c98c80, [0x5603f3c990f0] = 0x5603f3c990f0, [0x5603f3c99760] = 0x5603f3c99760, [0x5603f3c8bbb0] = 0x5603f3c8bbb0, [0x5603f3ca08f0] = 0x5603f3ca08f0, [0x5603f3c95bb0] = 0x5603f3c95bb0, [0x5603f3dceaa0] = 0x5603f3dceaa0, [0x5603f3c8b3b0] = 0x5603f3c8b3b0, [0x5603f3c87d50] = 0x5603f3c87d50, [0x5603f3c98d30] = 0x5603f3c98d30, [0x5603f3c929b0] = 0x5603f3c929b0, [0x5603f3bea770] = 0x5603f3bea770, [0x5603f3be12f0] = 0x5603f3be12f0, [0x5603f3c99290] = 0x5603f3c99290, [0x5603f3c92030] = 0x5603f3c92030}, new_guilds = 0x5603f368bd20 = {[0x5603f3c9ad40] = 0x5603f3c9ad40}, group_dms = 0x5603f368bd80 = {[0x5603f3c9a410] = 0x5603f3c9a410, [0x5603f3c99f10] = 0x5603f3c99f10, [0x5603f3c99d20] = 0x5603f3c99d20, [0x5603f3c9a7c0] = 0x5603f3c9a7c0}, frames_since_reconnect = 1, pending_writes = 0x0, roomlist_guild_count = 0, compress = 1, zstream = 0x0, http_keepalive_pool = 0x5603f3baa610}
pabs3 commented 3 years ago

Only thing I can think of is that one of them changed between the g_return_if_fail and the crash?

pabs3 commented 3 years ago

Hmm, this crash is different, SIGSEGV vs SIGABRT in the original crash.

EionRobb commented 3 years ago

I'm not 100% sure, but maaaybe the discord_socket_connected() callback is being called twice, once by an expired websocket, and once by a new one. The _connected code there isn't checking if the response conn is the same as the da->websocket either - I'll add in a similar check.

libpurple is single-threaded though, so I'm really confused how the conn == ya->websocket check would suddenly pass, unless g_return_if_fail is a no-op on your system? (It's possible to compile glib without debug support, which removes those error checks)

EionRobb commented 3 years ago

I added in the check in fa70784952b9448809a6ab957b80a9fb3893e7fc for good measure - don't know if it'll help your situation though :/

Its like the read input event is being called using free'd memory or something?

pabs3 commented 3 years ago

Looking at the glib header files, if the G_DISABLE_CHECKS define is enabled at purple-discord compile time, then g_return_if_fail should be a no-op, but AFAICT G_DISABLE_CHECKS isn't enabled anywhere.

I'll update to the new commit and restart pidgin under gdb.

-- bye, pabs

https://bonedaddy.net/pabs3/

pabs3 commented 3 years ago

I got another crash with commit d2de9af.

When I looked at frame 0 where the crash happens in libpurple's nss plugin, the problem is that the conn->private_data member is 0x21 instead of either NULL or a valid pointer. Reading through the libpurple nss plugin codebase, the only places private_data is modified, it gets either the value NULL or a valid pointer returned from g_new0. So I think that the *conn structure is getting corrupted. There are other signs of corruption too, conn->connect_cb_data points at a function connection_host_resolved from libpurple instead of a structure, conn->connect_cb points at a structure instead of a function, conn->error_cb looks like an invalid low-value pointer too. I'm going to try running pidgin under valgrind instead of gdb, it might be able to identify the memory corruption.

EionRobb commented 3 years ago

That sounds sensible. Thanks for persisting to get to the bottom of it. It's acting like it's using free'd memory, but I haven't been able to trace through by hand any cases of things getting close'd/free'd without the input event being disabled

pabs3 commented 3 years ago

Looks like you were right about the use-after-free being the cause of this crash. While running in valgrind I got the output below.

(03:30:59) discord: sending frame: {"op":1,"d":10}
(03:30:59) discord: got frame data: {"t":null,"s":null,"op":11,"d":null}
(03:31:19) discord: got frame data: {"t":null,"s":null,"op":7,"d":null}
(03:31:19) dnsquery: Performing DNS lookup for gateway.discord.gg
==253885== Invalid read of size 8
==253885==    at 0xC68FB83: discord_socket_got_data (libdiscord.c:4952)
==253885==    by 0x1820E1: pidgin_io_invoke (gtkeventloop.c:73)
==253885==    by 0x55C975E: g_main_dispatch (gmain.c:3337)
==253885==    by 0x55C975E: g_main_context_dispatch (gmain.c:4055)
==253885==    by 0x55C9B07: g_main_context_iterate.constprop.0 (gmain.c:4131)
==253885==    by 0x55C9DFA: g_main_loop_run (gmain.c:4329)
==253885==    by 0x4D8CB29: gtk_main (gtkmain.c:1270)
==253885==    by 0x145D80: main (gtkmain.c:947)
==253885==  Address 0xd2e1068 is 72 bytes inside a block of size 88 free'd
==253885==    at 0x48399AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==253885==    by 0xC67EB2F: discord_start_socket (libdiscord.c:5136)
==253885==    by 0xC68FED3: discord_process_frame (libdiscord.c:4724)
==253885==    by 0xC68FED3: discord_socket_got_data (libdiscord.c:5052)
==253885==    by 0x1820E1: pidgin_io_invoke (gtkeventloop.c:73)
==253885==    by 0x55C975E: g_main_dispatch (gmain.c:3337)
==253885==    by 0x55C975E: g_main_context_dispatch (gmain.c:4055)
==253885==    by 0x55C9B07: g_main_context_iterate.constprop.0 (gmain.c:4131)
==253885==    by 0x55C9DFA: g_main_loop_run (gmain.c:4329)
==253885==    by 0x4D8CB29: gtk_main (gtkmain.c:1270)
==253885==    by 0x145D80: main (gtkmain.c:947)
==253885==  Block was alloc'd at
==253885==    at 0x483AB65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==253885==    by 0x55CF790: g_malloc0 (gmem.c:136)
==253885==    by 0x574B88D: purple_ssl_connect_with_ssl_cn (sslconn.c:125)
==253885==    by 0x574B9DD: purple_ssl_connect (sslconn.c:103)
==253885==    by 0xC67EBBB: discord_start_socket (libdiscord.c:5152)
==253885==    by 0xC67EF0B: discord_login (libdiscord.c:4597)
==253885==    by 0x56F1997: purple_accounts_restore_current_statuses (account.c:3152)
==253885==    by 0x145D5D: main (gtkmain.c:926)
==253885==
(03:31:19) dns: Wait for DNS child 254205 failed: No child processes
(03:31:19) dns: Wait for DNS child 254203 failed: No child processes
(03:31:19) dns: Wait for DNS child 254204 failed: No child processes
(03:31:19) dns: Created new DNS child 257404, there are now 1 children.
(03:31:19) dns: Successfully sent DNS request to child 257404
(03:31:19) dns: Got response for 'gateway.discord.gg'
(03:31:19) dnsquery: IP resolved for gateway.discord.gg
(03:31:19) proxy: Attempting connection to 162.159.136.234
(03:31:19) proxy: Connecting to gateway.discord.gg:443 with no proxy
(03:31:19) proxy: Connection in progress
(03:31:19) proxy: Connecting to gateway.discord.gg:443.
(03:31:19) proxy: Connected to gateway.discord.gg:443.
pabs3 commented 3 years ago

That is weird though, discord_start_socket does set da->websocket = NULL;. The only difference to discord_close that I can see is that discord_close has da->websocket = NULL; immediately after purple_ssl_close(da->websocket); but discord_start_socket has da->websocket = NULL; after freeing da->zstream.

EionRobb commented 3 years ago

Oh, I see. If the op code is 7 https://github.com/EionRobb/purple-discord/blob/d2de9afe6bdb22ab49ed09c330f29d1e11a3af97/libdiscord.c#L4723-L4726 then the server is asking us to reconnect, so we close and renew the ssl socket, and then we loop over and keep trying to read again

I'm thinking that the correct fix is to check in https://github.com/EionRobb/purple-discord/blob/d2de9afe6bdb22ab49ed09c330f29d1e11a3af97/libdiscord.c#L5059 that websocket == conn still, as there might be a brand new connection?

EionRobb commented 3 years ago

Can you try with https://github.com/EionRobb/purple-discord/commit/fbbe0f9a6fea2b64bdcf89a6b83da99128028964 and see if it behaves?

pabs3 commented 3 years ago

I've built, installed and ran it under valgrind, will report results.

-- bye, pabs

https://bonedaddy.net/pabs3/

pabs3 commented 3 years ago

Looks like that fixed it, I saw a few "op":7 and there were no valgrind warnings.

EionRobb commented 3 years ago

Brilliant! Thanks for helping resolve that.