Closed gholms closed 9 years ago
I was suffering from the same connection issues earlier, apparently Steam was burping a bit. I did not suffer from any such crash, but I am running bitlbee with libevent/GnuTLS, rather than GLib/NSS. I can only assume this crash is the result of a double free, or an event is remaining registered when it should have been yanked.
I will poke around with this when I get some more time (I am currently travelling). I will also see if @dequis has any insight(s) on the matter.
This is the problem with all these damn crypto and event libraries being "supported."
Yeah, "supported", and 99% of users use gnutls, 1% use nss, and 0% use openssl. I think coverity doesn't even consider ssl_nss.c because it's not part of the usual compilation method.
But yeah highly likely that this is some use after free, a problem on the nss side. Can't see anything obvious right now though (guessing this is the proxy_connect callback being called after ssl_disconnect was called?). Maybe nss' ssl_disconnect
missing closesocket( conn->fd );
? How do we even free the phb stuff there?
I guess I should figure out how to reproduce this one reliably, maybe with a tcp connection killer.
I bet you could reproduce it by blackholing api.steampowered.com. When I encountered this issue it seemed to be completely unresponsive.
How exactly was this issue triggered? Was it taking a while to connect, and you ran account <acc> off
? Or did it just happen on its own?
17:14 < jgeboski> I wish gholms was in IRC
17:15 < dx> jgeboski: tell him to get the fuck in here
Looks like I didn't reconnect after I upgraded a few weeks ago. Sorry.
I ran across that issue shortly after Steam began having another one of its many connectivity, erm, issues. I don't recall triggering it with an action of my own.
Decided to try reproducing this one, copied some of the iptables commands from the comcast README and then disconnected/connected the account a bunch of times until it happened:
About to send HTTP request:
POST /ISteamWebUserPresenceOAuth/Logoff/v0001 HTTP/1.1
User-Agent: Steam App / BitlBee / 1.2.0
Content-Length: 61
Connection: Close
Accept: */*
Cookie:
Host: api.steampowered.com
Content-Type: application/x-www-form-urlencoded
access_token=XXXXXXXXXX&umqid=XXXXXXXXXX
About to send HTTP request:
POST /ISteamWebUserPresenceOAuth/Logon/v0001 HTTP/1.1
User-Agent: Steam App / BitlBee / 1.2.0
Content-Length: 73
Connection: Close
Accept: */*
Cookie:
Host: api.steampowered.com
Content-Type: application/x-www-form-urlencoded
ui_mode=web&access_token=XXXXXXXXXX&umqid=XXXXXXXXXX
==28981== Invalid write of size 8
==28981== at 0x56995A4: gnutls_init (in /usr/lib/libgnutls.so.28.41.4)
==28981== by 0x13FEFE: ssl_connected (ssl_gnutls.c:322)
==28981== by 0x13DED5: gaim_io_connected (proxy.c:105)
==28981== by 0x1362E7: gaim_io_invoke (events_glib.c:86)
==28981== by 0x538891C: g_main_context_dispatch (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981== by 0x5388CF7: ??? (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981== by 0x5389021: g_main_loop_run (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981== by 0x136259: b_main_run (events_glib.c:59)
==28981== by 0x134052: main (unix.c:170)
==28981== Address 0x7cf6d90 is 48 bytes inside a block of size 56 free'd
==28981== at 0x4C2B200: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28981== by 0x140373: ssl_disconnect (ssl_gnutls.c:467)
==28981== by 0x1383BE: http_close (http_client.c:696)
==28981== by 0x829AEB7: steam_http_req_close (steam-http.c:364)
==28981== by 0x829B092: steam_http_req_free (steam-http.c:386)
==28981== by 0x829B119: steam_http_free_reqs (steam-http.c:78)
==28981== by 0x8297858: steam_logout (steam.c:895)
==28981== by 0x145945: imc_logout (nogaim.c:401)
==28981== by 0x143502: bee_free (bee.c:57)
==28981== by 0x12098C: irc_free (irc.c:254)
==28981== by 0x53893C2: ??? (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981== by 0x538891C: g_main_context_dispatch (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981==
So it's definitely not NSS specific.
The rest of the valgrind crap: http://dump.dequis.org/PznWT.txt
And then this one appeared when I closed that bitlbee, lol:
==28981== 21 bytes in 1 blocks are definitely lost in loss record 41 of 158
==28981== at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28981== by 0x538E579: g_malloc (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981== by 0x53A6E5E: g_strdup (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981== by 0x13F972: ssl_connect (ssl_gnutls.c:121)
==28981== by 0x136A55: http_dorequest (http_client.c:49)
==28981== by 0x829B8C9: steam_http_req_send (steam-http.c:691)
==28981== by 0x8298ECD: steam_api_req_logoff (steam-api.c:801)
==28981== by 0x145945: imc_logout (nogaim.c:401)
==28981== by 0x143502: bee_free (bee.c:57)
==28981== by 0x12098C: irc_free (irc.c:254)
==28981== by 0x53893C2: ??? (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981== by 0x538891C: g_main_context_dispatch (in /usr/lib/libglib-2.0.so.0.4200.1)
(I decided to test this because I was considering doing http_close in the msn longpolling code, but then realized something could break.)
Okay, main difference between @gholms' error and mine is that mine has a fd != -1 and his is -1.
The problem is the same though, it just fails at different points ssl_connected
gets called when it shouldn't, and:
gnutls_init
conn->func(conn->data, 0, NULL, cond)
signaling that the connection is dead (because fd == -1)conn->func
as if data was received and is ready to process.Any of those conditions is going to segfault sooner or later because conn
and its members are already freed at that point.
In my case, it's because a fd was reused because i reconnected before ssl_connected
got called. In gholms', getsockopt probably detected the error condition.
The initial connection flow with no proxy is:
ssl_connect()
, passing callback ssl_connected
proxy_connect()
, creating struct PHB for internal connection useproxy_connect_none()
b_input_add
of gaim_io_connected
, setting input tag to phb->inpa
gaim_io_connected()
phb->func(phb->data, source, B_EV_IO_READ);
, where phb->func
is ssl_connected
ssl_connected()
Disconnect and/or replace the connection between steps 4 and 5 to get this bug.
The struct PHB is internal, created in step 2 and deleted in step 6, never exposed. I see no easy way to clean it up.
This is most likely easier to reproduce with the glib event loop. With libevent it may not crash but it might leak memory (just guessing though), see this comment in closesocket() (the glib counterpart just calls close(fd)
. also event_debug is a no-op macro)
tl;dr everything is terrible
Created bitlbee bug here: http://bugs.bitlbee.org/bitlbee/ticket/1198
I am closing this for now, as this issue is well out of the scope of this plugin.
That bug is now marked as fixed, since https://github.com/bitlbee/bitlbee/pull/54 was merged. I didn't specifically test bitlbee-steam but it should be covered by it, since the relevant changes happen in the ssl and http code.
I happened to have a connection failure for whatever reason and then wound up with a segfault in bitlbee's
ssl_connected
function that I presume was triggered bysteam_http_req_send
's call tohttp_dorequest
when it sends a POST to log in. Debug output and a traceback follow:I'm not particularly familiar with the code involved here, but
conn->hostname
appears to point out of bounds andconn->func
doesn't point anywhere near the address of thesteam_http_req_cb
function, leading me to believe something is getting corrupted here. I just can't tell if the cause is something in bitlbee-steam or something internal to bitlbee.I'm using the following: bitlbee-3.2.2-3.el7.x86_64 bitlbee-steam-1.1.1-0.4.20141218gitd8e939e.el7.x86_64 glib2-2.36.3-5.el7.x86_64 libgcrypt-1.5.3-4.el7.x86_64