EionRobb / skype4pidgin

Skype Plugin for Pidgin, libpurple and more
https://github.com/EionRobb/skype4pidgin/tree/master/skypeweb#skypeweb-plugin-for-pidgin
625 stars 88 forks source link

Crash in skypeweb_check_authrequests during heavy load #667

Closed bodqhrohro closed 2 years ago

bodqhrohro commented 3 years ago
Thread 1 "pidgin" received signal SIGSEGV, Segmentation fault.
0x00007ffff09bbdff in purple_http_keepalive_pool_ref () from /usr/lib/purple-2/libskypeweb.so
@(gdb) bt
#0  0x00007ffff09bbdff in purple_http_keepalive_pool_ref () at /usr/lib/purple-2/libskypeweb.so
#1  0x00007ffff09bcc7a in purple_http_request_set_keepalive_pool () at /usr/lib/purple-2/libskypeweb.so
#2  0x00007ffff09a7f69 in skypeweb_post_or_get () at /usr/lib/purple-2/libskypeweb.so
#3  0x00007ffff09ad52e in skypeweb_check_authrequests () at /usr/lib/purple-2/libskypeweb.so
#4  0x00007ffff7274614 in g_timeout_dispatch (source=0x55555cb5df70, callback=<optimized out>, user_data=<optimized out>)
    at ../../../glib/gmain.c:4849
#5  0x00007ffff7273a8f in g_main_dispatch (context=0x555555684860) at ../../../glib/gmain.c:3325
#6  g_main_context_dispatch (context=0x555555684860) at ../../../glib/gmain.c:4016
#7  0x00007ffff7273e38 in g_main_context_iterate (context=0x555555684860, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at ../../../glib/gmain.c:4092
#8  0x00007ffff727412b in g_main_loop_run (loop=0x555556748890) at ../../../glib/gmain.c:4290
#9  0x00007ffff775812a in gtk_main () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#10 0x0000555555590970 in main (argc=<optimized out>, argv=<optimized out>) at ././pidgin/gtkmain.c:939
bodqhrohro commented 2 years ago
Thread 1 "pidgin" received signal SIGSEGV, Segmentation fault.
purple_http_keepalive_pool_ref (pool=0x21) at purple2compat/http.c:2149
2149        pool->ref_count++;
@(gdb) bt
#0  purple_http_keepalive_pool_ref (pool=0x21) at purple2compat/http.c:2149
#1  0x00007ffff007895e in purple_http_request_set_keepalive_pool (request=0x555558eda460, pool=0x21) at purple2compat/http.c:2574
#2  0x00007ffff006c79f in skypeweb_post_or_get (sa=0x555555ef87b0, 
    sa@entry=<error reading variable: value has been optimized out>, method=method@entry=(SKYPEWEB_METHOD_GET | SKYPEWEB_METHOD_SSL), host=host@entry=0x7ffff007e59c "contacts.skype.com", url=url@entry=0x7ffff007ea88 "/contacts/v2/users/SELF/invites", postdata=postdata@entry=0x0, callback_func=callback_func@entry=0x7ffff006e240 <skypeweb_got_authrequests>, user_data=0x0, keepalive=1) at skypeweb_connection.c:92
#3  0x00007ffff0070daa in skypeweb_check_authrequests (sa=<error reading variable: value has been optimized out>) at skypeweb_contacts.c:1542
#4  0x00007ffff72f6744 in g_timeout_dispatch (source=0x55555d955180, callback=<optimized out>, user_data=<optimized out>)
    at ../../../glib/gmain.c:4933
#5  0x00007ffff72f5be4 in g_main_dispatch (context=0x5555556c8b10) at ../../../glib/gmain.c:3381
#6  g_main_context_dispatch (context=0x5555556c8b10) at ../../../glib/gmain.c:4099
#7  0x00007ffff72f5f88 in g_main_context_iterate (context=0x5555556c8b10, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at ../../../glib/gmain.c:4175
#8  0x00007ffff72f6273 in g_main_loop_run (loop=0x5555561fbe60) at ../../../glib/gmain.c:4373
#9  0x00007ffff79d8b2a in gtk_main () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#10 0x0000555555591d70 in main (argc=<optimized out>, argv=<optimized out>) at ././pidgin/gtkmain.c:947
@(gdb) p pool
$6 = (PurpleHttpKeepalivePool *) 0x21
@(gdb) p *pool
Cannot access memory at address 0x21

There's a memory corruption, and even the NULL check before doesn't help ;DD

bodqhrohro commented 2 years ago
@(gdb) p sa
$7 = (SkypeWebAccount *) 0x555555ef87b0
@(gdb) p *sa
$8 = {username = 0x555559d0bfe0 "", primary_member_name = 0x55555db283d0 "@\315\307YUU", self_display_name = 0x55555bd6f580 "0j\317ZUU", 
  account = 0x206f746f7b9a5250, pc = 0x4220726f66206469, keepalive_pool = 0x21, conns = 0x555002aa0078, cookie_jar = 0x0, 
  messages_host = 0x20 <error: Cannot access memory at address 0x20>, 
  sent_messages_hash = <error reading variable: Cannot access memory at address 0x59>0x21, poll_timeout = 2003789133, watchdog_timeout = 3197149168, 
  authcheck_timeout = 0, last_authrequest = 499263627632, skype_token = 0x31 <error: Cannot access memory at address 0x31>, 
  registration_token = 0x0, vdms_token = 0x555558c67f60 "", endpoint = 0x55555bd624e0 "\001", registration_expiry = 593282358, vdms_expiry = 32767}
@bq:17:32:41:/tmp/dl$ echo '206f746f7b9a5250'|xxd -r -p
 oto{�RP@bq:17:47:27:/tmp/dl$ echo 4220726f66206469|xxd -r -p
B rof di@bq:17:47:49:/tmp/dl$ echo 555002aa0078|xxd -r -p
UP�x@bq:17:48:14:/tmp/dl$ 
bodqhrohro commented 2 years ago

Should maybe a pointer to a pointer be passed to callbacks in such cases? This way they may be nulled when freeing, which allows a robust null check in the callee.

EionRobb commented 2 years ago

Is there something unusual about your setup? You seem to get so many crashes across so many plugins so frequently. :)

bodqhrohro commented 2 years ago

The "unusual" thing is that I almost never reboot my laptop.

root@localhost:~# uptime
 19:55:56 up 182 days, 23:40, 14 users,  load average: 19,82, 13,69, 9,63

I freeze the greedy processes, suspend the system, then wake it and unfreeze them. I use only cellular data, via a USB modem till late 2019, now from a spadephone via a Wi-Fi hotspot. This involves a sudden (from Pidgin's point of view) and complete reconnect, especially if the cellular operator has renewed NAT mappings and the connections cannot be restored (was always the case with a modem, now sporadically).

bodqhrohro commented 2 years ago

Actually, I doubt that libpurple itself can free this structure before the prpl is closed and the following timer is destructed.

So the only sane explanation is that the memory corruption happens from some other place, and affects different structures here and there, causing weird bugs — possibly even from some other prpl!

I think of hardening the code by fencing all the changes of the structure (and some other structures as well) with mprotect calls. It should cause segmentation fault right where the corruption happens, rather than afterwards.

bodqhrohro commented 2 years ago

Hehe, I missed this part:

addr must be aligned to a page boundary

So it just doesn't work.

Possibly I can enforce the structure to be allocated in a separate memory page, but this would likely just prevent the corruption from happening. Still sounds like a solution for the crash xD

EionRobb commented 2 years ago

Could you try run it through valgrind and see if there is indeed something else messing with memory that it shouldnt?

The way you get so many crashes across so many prpls (including Skype) makes me thing some other trickery might be at work

bodqhrohro commented 2 years ago

poll_timeout = 2003789133 is 776f654d, which is "Meow". I'm pretty sure this "Meow" didn't come from Skype xD But I have a buddy with a name "Vladislaw Meow🐾" from Telegram-Purple. Soooo?..

bodqhrohro commented 2 years ago

Could you try run it through valgrind

AFAIR, it's way too heavy for me. It's not like I can free up enough resources for that and quickly reproduce, I'll have to run it possibly for days, with a sane performance and while doing other tasks, until I finally catch it.

EionRobb commented 2 years ago

Could you try switching off telegram-purple (eg to use tdlib-purple) and see if that helps? :)

bodqhrohro commented 2 years ago

Yeah, and the "oto id for B" should come from the debug string in tgp-info.c:

debug ("photo id for %s hasn't changed %lld", parent ? parent->print_name : P->print_name, id);
bodqhrohro commented 2 years ago

I prefer telegram-purple over tdlib-purple, and I don't really need both already because I have telegabber. The only remaining advantage of telegram-purple is that it tries to fetch all the history it missed, while telegabber can lose some messages while it was down, or if the XMPP server it was connected to dropped the offline messages until a client fetched them, and even its /history command is flaky (and also requires me to look in TelegramDesktop what chats I missed). If I circumvent the latter, prpls for Telegram would become useless for me completely.

Also, I believe that non-TDlib-based MTProto implementations are valuable because the development of TDlib is controlled by a developer affiliated by Telegram, and thus may involve an unwanted behaviour (it's already the case, though, that's why TDlight exists, at least). Both tdlib-purple and telegabber use TDlib, while telegram-purple doesn't.

Telegram-Purple has lots of more obvious bugs which I was going to patch too, but started messing with this one first, hehe.

bodqhrohro commented 2 years ago

I wonder how could the debug string leak here at all. The only place it could possibly be copied to heap at all is arg_s in purple_debug_vargs. So the allocator actually considers the structure freed, huh?

Maybe the wrong pointer is freed somewhere? It would be pretty hard to find then.

EionRobb commented 2 years ago

Running with valgrind would make it pretty easy to find ;)

bodqhrohro commented 2 years ago
(17:20:00) http: Performing new request 0x555559a26f00 to contacts.skype.com.
(17:20:00) http: pool is destroying
(17:20:00) http: Request 0x555559a26f00 performed without success.
(17:20:00) skypeweb: No data in response

(Pidgin:3105921): Json-CRITICAL **: 17:20:00.883: json_node_get_object: assertion 'JSON_NODE_IS_VALID (node)' failed

Thread 1 "pidgin" received signal SIGSEGV, Segmentation fault.
purple_http_connection_set_add (set=0x0, http_conn=0x555559a26f00) at purple2compat/http.c:2427
2427        if (set->is_destroying)
@(gdb) bt
#0  purple_http_connection_set_add (set=0x0, http_conn=0x555559a26f00) at purple2compat/http.c:2427
#1  0x00007ffff00687ba in skypeweb_post_or_get (sa=0x555557836170, 
    sa@entry=<error reading variable: value has been optimized out>, method=method@entry=(SKYPEWEB_METHOD_GET | SKYPEWEB_METHOD_SSL), host=host@entry=0x7ffff007a59c "contacts.skype.com", url=url@entry=0x7ffff007aa88 "/contacts/v2/users/SELF/invites", postdata=postdata@entry=0x0, callback_func=callback_func@entry=0x7ffff006a2a0 <skypeweb_got_authrequests>, user_data=0x0, keepalive=1) at skypeweb_connection.c:152
#2  0x00007ffff006ce7a in skypeweb_check_authrequests (sa=<error reading variable: value has been optimized out>) at skypeweb_contacts.c:1548
#3  0x00007ffff72f5744 in g_timeout_dispatch (source=0x55555965c560, callback=<optimized out>, user_data=<optimized out>)
    at ../../../glib/gmain.c:4933
#4  0x00007ffff72f4be4 in g_main_dispatch (context=0x5555556c8b10) at ../../../glib/gmain.c:3381
#5  g_main_context_dispatch (context=0x5555556c8b10) at ../../../glib/gmain.c:4099
#6  0x00007ffff72f4f88 in g_main_context_iterate (context=0x5555556c8b10, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at ../../../glib/gmain.c:4175
#7  0x00007ffff72f5273 in g_main_loop_run (loop=0x555555d36160) at ../../../glib/gmain.c:4373
#8  0x00007ffff79d7b2a in gtk_main () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#9  0x0000555555591d70 in main (argc=<optimized out>, argv=<optimized out>) at ././pidgin/gtkmain.c:947
@(gdb) p set
$7 = (PurpleHttpConnectionSet *) 0x0
@(gdb) p *set
Cannot access memory at address 0x0
@(gdb) up
#1  0x00007ffff00687ba in skypeweb_post_or_get (sa=0x555557836170, sa@entry=<error reading variable: value has been optimized out>, 
    method=method@entry=(SKYPEWEB_METHOD_GET | SKYPEWEB_METHOD_SSL), host=host@entry=0x7ffff007a59c "contacts.skype.com", 
    url=url@entry=0x7ffff007aa88 "/contacts/v2/users/SELF/invites", postdata=postdata@entry=0x0, 
    callback_func=callback_func@entry=0x7ffff006a2a0 <skypeweb_got_authrequests>, user_data=0x0, keepalive=1) at skypeweb_connection.c:152
152     purple_http_connection_set_add(sa->conns, conn->http_conn);
@(gdb) p *sa
$8 = {username = 0x555556e62330 "lerts", primary_member_name = 0x5555591ef660 "hat", 
  self_display_name = 0x2 <error: Cannot access memory at address 0x2>, account = 0x555559bda1c0, pc = 0x1, keepalive_pool = 0x555556e62340, 
  conns = 0x0, cookie_jar = 0x0, messages_host = 0x0, sent_messages_hash = 0x0, poll_timeout = 0, watchdog_timeout = 0, authcheck_timeout = 65, 
  last_authrequest = 0, skype_token = 0x555557c9dcd0 "", 
  registration_token = 0x3735633438393563 <error: Cannot access memory at address 0x3735633438393563>, 
  vdms_token = 0x3134353139333362 <error: Cannot access memory at address 0x3134353139333362>, 
  endpoint = 0x6561633162383930 <error: Cannot access memory at address 0x6561633162383930>, registration_expiry = 1735420462, 
  vdms_expiry = 1649427749}
@(gdb) p sa
$9 = (SkypeWebAccount *) 0x555557836170

Hey, I'm no "lerts"! :P

This "lerts" is surprisingly surrounded with null bytes.

The "hat" is a part of:

@(gdb) x 0x5555591ef650
0x5555591ef650: "Retro Stickers Chat"

That's mine, from Telegram!

0x555559bda1c0: "tab-label-event"
@(gdb) x/sb 0x555559bda1bf

So, everything was freed. So accurately that I barely believe some other prpl, expecting an other structure, could do this. Despite the skypeweb prpl had to remove this timeout too when freeing correctly.

Maybe a GLib bug, huh? I have 2.70.2-1.

bodqhrohro commented 2 years ago

Or can glib_eventloops->timeout_remove be possibly overriden (in a broken way) by some plugin?

bodqhrohro commented 2 years ago
Thread 1 "pidgin" received signal SIGSEGV, Segmentation fault.
purple_http_keepalive_pool_ref (pool=0x21) at purple2compat/http.c:2149
2149        pool->ref_count++;
@(gdb) dns[1481059]: nobody needs me... =(
dns[1481060]: nobody needs me... =(
dns[1481062]: nobody needs me... =(
dns[1481061]: nobody needs me... =(
bt
#0  purple_http_keepalive_pool_ref (pool=0x21) at purple2compat/http.c:2149
#1  0x00007ffff0471b7e in purple_http_request_set_keepalive_pool (request=0x5555570d3340, pool=0x21) at purple2compat/http.c:2574
#2  0x00007ffff04657ff in skypeweb_post_or_get
    (sa=sa@entry=0x555555ebd2d0, method=method@entry=(SKYPEWEB_METHOD_POST | SKYPEWEB_METHOD_SSL), host=0x555556b9bfb0 "\260\031\304VUU", url=url@entry=0x7ffff04795d8 "/v1/users/ME/endpoints/SELF/subscriptions/0/poll", postdata=postdata@entry=0x0, callback_func=callback_func@entry=0x7ffff046eac0 <skypeweb_poll_cb>, user_data=0x0, keepalive=1) at skypeweb_connection.c:92
#3  0x00007ffff046c267 in skypeweb_poll (sa=sa@entry=0x555555ebd2d0) at skypeweb_messages.c:794
#4  0x00007ffff046c279 in skypeweb_timeout (userdata=0x555555ebd2d0, userdata@entry=<error reading variable: value has been optimized out>)
    at skypeweb_messages.c:706
#5  0x00007ffff72f5744 in g_timeout_dispatch (source=0x55555814bd50, callback=<optimized out>, user_data=<optimized out>)
    at ../../../glib/gmain.c:4933
#6  0x00007ffff72f4be4 in g_main_dispatch (context=0x5555556c8b10) at ../../../glib/gmain.c:3381
#7  g_main_context_dispatch (context=0x5555556c8b10) at ../../../glib/gmain.c:4099
#8  0x00007ffff72f4f88 in g_main_context_iterate (context=0x5555556c8b10, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at ../../../glib/gmain.c:4175
#9  0x00007ffff72f5273 in g_main_loop_run (loop=0x555555ec4f80) at ../../../glib/gmain.c:4373
#10 0x00007ffff79d7b2a in gtk_main () at /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#11 0x0000555555591d70 in main (argc=<optimized out>, argv=<optimized out>) at ././pidgin/gtkmain.c:947
@(gdb) up
#1  0x00007ffff0471b7e in purple_http_request_set_keepalive_pool (request=0x5555570d3340, pool=0x21) at purple2compat/http.c:2574
2574            purple_http_keepalive_pool_ref(pool);
@(gdb) up
#2  0x00007ffff04657ff in skypeweb_post_or_get (sa=sa@entry=0x555555ebd2d0, method=method@entry=(SKYPEWEB_METHOD_POST | SKYPEWEB_METHOD_SSL), 
    host=0x555556b9bfb0 "\260\031\304VUU", url=url@entry=0x7ffff04795d8 "/v1/users/ME/endpoints/SELF/subscriptions/0/poll", 
    postdata=postdata@entry=0x0, callback_func=callback_func@entry=0x7ffff046eac0 <skypeweb_poll_cb>, user_data=0x0, keepalive=1)
    at skypeweb_connection.c:92
92          purple_http_request_set_keepalive_pool(request, sa->keepalive_pool);
@(gdb) p sa
$1 = (SkypeWebAccount *) 0x555555ebd2d0
@(gdb) p *sa
$2 = {username = 0x0, primary_member_name = 0x21 <error: Cannot access memory at address 0x21>, self_display_name = 0x555556080110 "0\300\363UUU", 
  account = 0x555556080110, pc = 0x555556cd5360, keepalive_pool = 0x21, conns = 0x100000010, cookie_jar = 0x7ffff6ea3c00 <main_arena+96>, 
  messages_host = 0x555556b9bfb0 "\260\031\304VUU", sent_messages_hash = <error reading variable: Cannot access memory at address 0x79>0x41, 
  poll_timeout = 1462248816, watchdog_timeout = 21845, authcheck_timeout = 0, last_authrequest = 0, skype_token = 0x0, registration_token = 0x0, 
  vdms_token = 0x0, endpoint = 0x0, registration_expiry = 65, vdms_expiry = 0}

So the fix in the libnotify+ plugin still didn't help.

It's pretty interesting though that the same byte gets overwritten with an exclamation mark often. It should come from some place.