andrew-bibb / cmst

QT GUI for Connman
174 stars 37 forks source link

Crashes when bringing the interface up. #240

Closed corvus1 closed 2 years ago

corvus1 commented 3 years ago

So, long story short. What I did is I ran cmst under gdb and brought eth0 down and then back up. Here's what I collected:

gdb cmst
GNU gdb (Gentoo 10.2 vanilla) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from cmst...
(gdb) set logging file backtrace.log
(gdb) run
Starting program: /usr/bin/cmst 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff33d8640 (LWP 30482)]
[New Thread 0x7ffff1961640 (LWP 30486)]
[New Thread 0x7ffff1160640 (LWP 30487)]

(cmst:30478): Gtk-WARNING **: 21:09:32.120: Theme parsing error: gtk-widgets.css:1594:13: not a number

(cmst:30478): Gtk-WARNING **: 21:09:32.120: Theme parsing error: gtk-widgets.css:1594:13: Expected a string.

(cmst:30478): Gtk-WARNING **: 21:09:32.120: Theme parsing error: gtk-widgets.css:1976:11: not a number

(cmst:30478): Gtk-WARNING **: 21:09:32.120: Theme parsing error: gtk-widgets.css:1976:11: Expected a string.

(cmst:30478): Gtk-WARNING **: 21:09:32.121: Theme parsing error: gtk-widgets.css:2526:11: not a number

(cmst:30478): Gtk-WARNING **: 21:09:32.121: Theme parsing error: gtk-widgets.css:2526:11: Expected a string.

(cmst:30478): Gtk-WARNING **: 21:09:32.122: Theme parsing error: applications.css:21:0: Expected a valid selector
[New Thread 0x7ffff0957640 (LWP 30488)]
[Detaching after fork from child process 30489]

Thread 1 "cmst" received signal SIGSEGV, Segmentation fault.
0x000055555559546c in ControlBox::assembleTrayIcon (this=0x7fffffffd5b0) at code/control_box/controlbox.cpp:2264
2264    code/control_box/controlbox.cpp: Нет такого файла или каталога.
(gdb) set logging on
Copying output to backtrace.log.
Copying debug output to backtrace.log.
(gdb) bt
#0  0x000055555559546c in ControlBox::assembleTrayIcon() (this=0x7fffffffd5b0) at code/control_box/controlbox.cpp:2264
#1  0x0000555555597633 in ControlBox::updateDisplayWidgets() (this=0x7fffffffd5b0) at code/control_box/controlbox.cpp:584
#2  ControlBox::updateDisplayWidgets() (this=0x7fffffffd5b0) at code/control_box/controlbox.cpp:570
#3  0x000055555559771d in ControlBox::dbsPropertyChanged(QString, QDBusVariant) (this=this@entry=0x7fffffffd5b0, prop=..., dbvalue=...)
    at code/control_box/controlbox.cpp:911
#4  0x00005555555fd1da in ControlBox::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)
    (_o=0x7fffffffd5b0, _c=<optimized out>, _id=<optimized out>, _a=0x7fffffffcf90) at moc_files/moc_controlbox.cpp:401
#5  0x00005555555fdcc3 in ControlBox::qt_metacall(QMetaObject::Call, int, void**)
    (this=0x7fffffffd5b0, _c=QMetaObject::InvokeMetaMethod, _id=14, _a=0x7fffffffcf90) at moc_files/moc_controlbox.cpp:595
#6  0x00007ffff72f1b3a in  () at /usr/lib64/libQt5DBus.so.5
#7  0x00007ffff6e81903 in QObject::event(QEvent*) () at /usr/lib64/libQt5Core.so.5
#8  0x00007ffff7a5b74f in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib64/libQt5Widgets.so.5
#9  0x00007ffff6e56d68 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib64/libQt5Core.so.5
#10 0x00007ffff6e5a247 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () at /usr/lib64/libQt5Core.so.5
#11 0x00007ffff6ea7eb3 in  () at /usr/lib64/libQt5Core.so.5
#12 0x00007ffff5ae853b in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
#13 0x00007ffff5ae87e8 in  () at /usr/lib64/libglib-2.0.so.0
#14 0x00007ffff5ae889f in g_main_context_iteration () at /usr/lib64/libglib-2.0.so.0
#15 0x00007ffff6ea798b in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5
#16 0x00007ffff6e5583b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5
#17 0x00007ffff6e5dad0 in QCoreApplication::exec() () at /usr/lib64/libQt5Core.so.5
#18 0x00005555555786d3 in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at code/main.cpp:196
(gdb) set logging off
Done logging to backtrace.log.

2264 code/control_box/controlbox.cpp: Нет такого файла или каталога.

I apologize for that. It means "no such file or directory".

benkohler commented 2 years ago

I can reproduce this crash as well. Gdb is blaming https://github.com/andrew-bibb/cmst/blob/master/apps/cmstapp/code/control_box/controlbox.cpp#L2264

andrew-bibb commented 2 years ago

I'll try to look into this over the weekend. Something has changed, likely exposing a programming error that was there all along.

andrew-bibb commented 2 years ago

Can either of you tell me how your system is configured? I'm not able to duplicate the crash, either by powering off and then on ethernet nor by unplugging the cable and then plugging it back in. The "no such file or directory" message is also confusing because at this point in the program it is not reading any file or directory. It is processing a QMap.

One other question, is this only occurring when running under gdb or did you run gdb because you were getting a crash? In the mean time I'll keep poking around looking for something that could be causing this.

corvus1 commented 2 years ago

Can either of you tell me how your system is configured?

On my system I can very reliably crash cmst by either pulling the ethernet cable and then plugging it back in or by bringing the inferface down and back up. ifconfig eth0 down/up It crashes cmst when I bring it up every time. @benkohler told me that for him it's less reliable, and sometimes takes a couple of cycles up/down.

The "no such file or directory" message is also confusing because at this point in the program it is not reading any file or directory. It is processing a QMap.

You can simply ignore that message, I've been told that it's just GDB trying to open the source file, to show the fragment of code that it blames.

One other question, is this only occurring when running under gdb or did you run gdb because you were getting a crash? In the mean time I'll keep poking around looking for something that could be causing this.

The crash occures without gdb. GDB was introduced because of the crashes. In fact I only discovered it because my interface has a habit of momentarily going down and back up, probably because of some quirk of the hardware or the driver.

Any additional info about my setup you might need, I'll do my best to provide.

andrew-bibb commented 2 years ago

Thank you for the note about file or directory, that helps a great deal.

I just brought the interface up and down using ip and still can't reproduce the issue. Now that I know there are no external files involved it narrows down my search. Because of a segfault it sounds like the QMap is somehow being used uninitialized. That is where I'm going to look next.

corvus1 commented 2 years ago

The only other thing I would mention is that the issue seems to only affect ethernet interfaces and not wireless.

andrew-bibb commented 2 years ago

This is hard without being able to reproduce the error, but I may have located it. My hypothesis would explain why sometimes you get it and sometimes you don't. According to the gdb trace this is triggered after the dbsPropertyChanged() function is called. This function is only called when Connman manager issues a property changed (basically online, idle, ready or offline) signal. The segfault is triggered (line 2264) when we try to read the services_list. I think this may be a race where the services list has not been updated before the properties changed signal is sent. A change in properties requires some change in the services list and it appears that is not happening when the crash occurs. That is where I am going to start looking anyway.

corvus1 commented 2 years ago

Well, I can only promise you that when you need something tested I will be all over it. :+1:

corvus1 commented 2 years ago

Interestingly, it doesn't seem to crash when wifi goes up/down, even though a bit further in the same function (line 2272) it does the same for wifi interfaces.

andrew-bibb commented 2 years ago

If I am right on this the fix was very simple. I just uploaded my attempt at a fix. Unfortunately I seem to have added a bunch of extraneous files so need to find a way to clean them up here. GIt is not my strong suit.

Basically changed so I don't make a call to updateDisplayWidgets() after a property change. As I mention above, a property change must trigger a change in the services list so I just do the updateDisplayWidgets() after the services list changes. That way the services list should be up to date and hopefully won't crash.

I've tested here with wifi and ethernet (completely removing my VPN configurations to keep it simple) and it seems to work, but on the other hand it was working here before. These sometimes they crash (ethernet) and sometimes they don't (wifi) problems are miserable to track down.

corvus1 commented 2 years ago

I just rebuilt cmst from git, and I am happy to report that it seems to have fixed it. It'd be cool if @benkohler also tested it to be extra sure, but really it was crashing every time, and now it doesn't. So great job, thanks!

andrew-bibb commented 2 years ago

Great news. Wanted to mention that I never would have found the issue had you not posted the gdb printout in the first post. That was absolutely critical information to have for locating the problem.

I'll keep this issue open for a couple of weeks but think I'll do a formal release maybe this weekend. This is fixed (or seems to be) plus a lot of translations have come in. People working on those might like to see them exist someplace other than Github.

corvus1 commented 2 years ago

Great news. Wanted to mention that I never would have found the issue had you not posted the gdb printout in the first post. That was absolutely critical information to have for locating the problem.

Yeah, I figured it's hard enough as it is, and without backtrace there's just not much that can be done, aside from divination.