bitlbee / bitlbee-steam

Steam protocol plugin for BitlBee
GNU General Public License v2.0
126 stars 12 forks source link

Connection failures result in segfaults #71

Closed gholms closed 9 years ago

gholms commented 9 years ago

I happened to have a connection failure for whatever reason and then wound up with a segfault in bitlbee's ssl_connected function that I presume was triggered by steam_http_req_send's call to http_dorequest when it sends a POST to log in. Debug output and a traceback follow:

[bitlbee-steam] POST Request (0x5555558142a0): https://api.steampowered.com:443/ISteamWebUserPresenceOAuth/Logon/v0001
[bitlbee-steam]   User-Agent: Steam App / BitlBee / 1.1.1
[bitlbee-steam]   Content-Length: 74
[bitlbee-steam]   Connection: Close
[bitlbee-steam]   Accept: */*
[bitlbee-steam]   Cookie: steamLogin=XXXX; sessionid=XXXX
[bitlbee-steam]   Host: api.steampowered.com
[bitlbee-steam]   Content-Type: application/x-www-form-urlencoded
[bitlbee-steam]   
[bitlbee-steam]   ui_mode=web&access_token=XXXX&umqid=XXXX
About to send HTTP request:
POST /ISteamWebUserPresenceOAuth/Logon/v0001 HTTP/1.1
User-Agent: Steam App / BitlBee / 1.1.1
Content-Length: 74
Connection: Close
Accept: */*
Cookie: steamLogin=XXXX; sessionid=XXXX
Host: api.steampowered.com
Content-Type: application/x-www-form-urlencoded

ui_mode=web&access_token=XXXX&umqid=XXXX

Program received signal SIGSEGV, Segmentation fault.
0x000055555558343a in ssl_connected (data=0x555555812fa0, source=-1, cond=B_EV_IO_READ) at ssl_nss.c:220
220     conn->func(conn->data, 0, NULL, cond);
#0  0x000055555558343a in ssl_connected (data=0x555555812fa0, source=-1, cond=B_EV_IO_READ) at ssl_nss.c:220
#1  0x0000555555582780 in gaim_io_connected (data=0x5555557ea2e0, source=12, cond=cond@entry=(B_EV_IO_READ | B_EV_IO_WRITE)) at proxy.c:98
#2  0x000055555557aa6e in gaim_io_invoke (source=<optimized out>, condition=<optimized out>, data=0x555555813100) at events_glib.c:88
#3  0x00007ffff75f5ac6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#4  0x00007ffff75f5e48 in g_main_context_iterate.isra.22 () from /lib64/libglib-2.0.so.0
#5  0x00007ffff75f625a in g_main_loop_run () from /lib64/libglib-2.0.so.0
#6  0x000055555557aadc in b_main_run () at events_glib.c:64
#7  0x00005555555669ef in main (argc=<optimized out>, argv=0x7fffffffe4f8) at unix.c:174

I'm not particularly familiar with the code involved here, but conn->hostname appears to point out of bounds and conn->func doesn't point anywhere near the address of the steam_http_req_cb function, leading me to believe something is getting corrupted here. I just can't tell if the cause is something in bitlbee-steam or something internal to bitlbee.

I'm using the following: bitlbee-3.2.2-3.el7.x86_64 bitlbee-steam-1.1.1-0.4.20141218gitd8e939e.el7.x86_64 glib2-2.36.3-5.el7.x86_64 libgcrypt-1.5.3-4.el7.x86_64

jgeboski commented 9 years ago

I was suffering from the same connection issues earlier, apparently Steam was burping a bit. I did not suffer from any such crash, but I am running bitlbee with libevent/GnuTLS, rather than GLib/NSS. I can only assume this crash is the result of a double free, or an event is remaining registered when it should have been yanked.

I will poke around with this when I get some more time (I am currently travelling). I will also see if @dequis has any insight(s) on the matter.

This is the problem with all these damn crypto and event libraries being "supported."

dequis commented 9 years ago

Yeah, "supported", and 99% of users use gnutls, 1% use nss, and 0% use openssl. I think coverity doesn't even consider ssl_nss.c because it's not part of the usual compilation method.

But yeah highly likely that this is some use after free, a problem on the nss side. Can't see anything obvious right now though (guessing this is the proxy_connect callback being called after ssl_disconnect was called?). Maybe nss' ssl_disconnect missing closesocket( conn->fd );? How do we even free the phb stuff there?

I guess I should figure out how to reproduce this one reliably, maybe with a tcp connection killer.

gholms commented 9 years ago

I bet you could reproduce it by blackholing api.steampowered.com. When I encountered this issue it seemed to be completely unresponsive.

jgeboski commented 9 years ago

How exactly was this issue triggered? Was it taking a while to connect, and you ran account <acc> off? Or did it just happen on its own?

dequis commented 9 years ago
17:14 < jgeboski> I wish gholms was in IRC
17:15 < dx> jgeboski: tell him to get the fuck in here
gholms commented 9 years ago

Looks like I didn't reconnect after I upgraded a few weeks ago. Sorry.

I ran across that issue shortly after Steam began having another one of its many connectivity, erm, issues. I don't recall triggering it with an action of my own.

dequis commented 9 years ago

Decided to try reproducing this one, copied some of the iptables commands from the comcast README and then disconnected/connected the account a bunch of times until it happened:

About to send HTTP request:
POST /ISteamWebUserPresenceOAuth/Logoff/v0001 HTTP/1.1
User-Agent: Steam App / BitlBee / 1.2.0
Content-Length: 61
Connection: Close
Accept: */*
Cookie:
Host: api.steampowered.com
Content-Type: application/x-www-form-urlencoded

access_token=XXXXXXXXXX&umqid=XXXXXXXXXX
About to send HTTP request:
POST /ISteamWebUserPresenceOAuth/Logon/v0001 HTTP/1.1
User-Agent: Steam App / BitlBee / 1.2.0
Content-Length: 73
Connection: Close
Accept: */*
Cookie:
Host: api.steampowered.com
Content-Type: application/x-www-form-urlencoded

ui_mode=web&access_token=XXXXXXXXXX&umqid=XXXXXXXXXX
==28981== Invalid write of size 8
==28981==    at 0x56995A4: gnutls_init (in /usr/lib/libgnutls.so.28.41.4)
==28981==    by 0x13FEFE: ssl_connected (ssl_gnutls.c:322)
==28981==    by 0x13DED5: gaim_io_connected (proxy.c:105)
==28981==    by 0x1362E7: gaim_io_invoke (events_glib.c:86)
==28981==    by 0x538891C: g_main_context_dispatch (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981==    by 0x5388CF7: ??? (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981==    by 0x5389021: g_main_loop_run (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981==    by 0x136259: b_main_run (events_glib.c:59)
==28981==    by 0x134052: main (unix.c:170)
==28981==  Address 0x7cf6d90 is 48 bytes inside a block of size 56 free'd
==28981==    at 0x4C2B200: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28981==    by 0x140373: ssl_disconnect (ssl_gnutls.c:467)
==28981==    by 0x1383BE: http_close (http_client.c:696)
==28981==    by 0x829AEB7: steam_http_req_close (steam-http.c:364)
==28981==    by 0x829B092: steam_http_req_free (steam-http.c:386)
==28981==    by 0x829B119: steam_http_free_reqs (steam-http.c:78)
==28981==    by 0x8297858: steam_logout (steam.c:895)
==28981==    by 0x145945: imc_logout (nogaim.c:401)
==28981==    by 0x143502: bee_free (bee.c:57)
==28981==    by 0x12098C: irc_free (irc.c:254)
==28981==    by 0x53893C2: ??? (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981==    by 0x538891C: g_main_context_dispatch (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981==

So it's definitely not NSS specific.

The rest of the valgrind crap: http://dump.dequis.org/PznWT.txt

And then this one appeared when I closed that bitlbee, lol:

==28981== 21 bytes in 1 blocks are definitely lost in loss record 41 of 158
==28981==    at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28981==    by 0x538E579: g_malloc (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981==    by 0x53A6E5E: g_strdup (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981==    by 0x13F972: ssl_connect (ssl_gnutls.c:121)
==28981==    by 0x136A55: http_dorequest (http_client.c:49)
==28981==    by 0x829B8C9: steam_http_req_send (steam-http.c:691)
==28981==    by 0x8298ECD: steam_api_req_logoff (steam-api.c:801)
==28981==    by 0x145945: imc_logout (nogaim.c:401)
==28981==    by 0x143502: bee_free (bee.c:57)
==28981==    by 0x12098C: irc_free (irc.c:254)
==28981==    by 0x53893C2: ??? (in /usr/lib/libglib-2.0.so.0.4200.1)
==28981==    by 0x538891C: g_main_context_dispatch (in /usr/lib/libglib-2.0.so.0.4200.1)

(I decided to test this because I was considering doing http_close in the msn longpolling code, but then realized something could break.)

dequis commented 9 years ago

Okay, main difference between @gholms' error and mine is that mine has a fd != -1 and his is -1.

The problem is the same though, it just fails at different points ssl_connected gets called when it shouldn't, and:

Any of those conditions is going to segfault sooner or later because conn and its members are already freed at that point.

In my case, it's because a fd was reused because i reconnected before ssl_connected got called. In gholms', getsockopt probably detected the error condition.

The initial connection flow with no proxy is:

  1. ssl_connect(), passing callback ssl_connected
  2. proxy_connect(), creating struct PHB for internal connection use
  3. proxy_connect_none()
  4. b_input_add of gaim_io_connected, setting input tag to phb->inpa
  5. gaim_io_connected()
  6. phb->func(phb->data, source, B_EV_IO_READ);, where phb->func is ssl_connected
  7. ssl_connected()

Disconnect and/or replace the connection between steps 4 and 5 to get this bug.

The struct PHB is internal, created in step 2 and deleted in step 6, never exposed. I see no easy way to clean it up.

This is most likely easier to reproduce with the glib event loop. With libevent it may not crash but it might leak memory (just guessing though), see this comment in closesocket() (the glib counterpart just calls close(fd). also event_debug is a no-op macro)

dequis commented 9 years ago

tl;dr everything is terrible

dequis commented 9 years ago

Created bitlbee bug here: http://bugs.bitlbee.org/bitlbee/ticket/1198

jgeboski commented 9 years ago

I am closing this for now, as this issue is well out of the scope of this plugin.

dequis commented 8 years ago

That bug is now marked as fixed, since https://github.com/bitlbee/bitlbee/pull/54 was merged. I didn't specifically test bitlbee-steam but it should be covered by it, since the relevant changes happen in the ssl and http code.