Martchus / syncthingtray

Tray application and Dolphin/Plasma integration for Syncthing
https://martchus.github.io/syncthingtray/
Other
1.68k stars 44 forks source link

Settings->Extras->Systemd Does not successfully query --user unit status #144

Closed sten0 closed 1 year ago

sten0 commented 2 years ago

Relevant components

I started doing QA for the package, to identify issues that new users are likely to encounter and be disappointed (and give up) or complain about. This short video of 1.1.20 shows how the plasmoid fails to successfully query --user unit status during the "initial setup" experience. Yes, logging out and then logging back in again will light up those status indicators in green, but that's a workaround for the bug.

https://drive.google.com/file/d/1Ql_cMvQJnDz9bKxvTSaUR_y0BdrbZIpu/view?usp=sharing

I believe that that 1.2.0 is likely to be affected because I found the following during fuzzing (intended to fail, and I was checking how it was failing):

1: Executing test cases ...
1: .
1:  - Setup configuration for Syncthing tests ...
1:  - No timeout factor set, defaulting to 1
1:    (set environment variable SYNCTHING_TEST_TIMEOUT_FACTOR to specify a factor to run tests on a slow machine)
1: 
1:  - Launching Syncthing: syncthing -gui-address=http://127.0.0.1:4001 -gui-apikey=syncthingtestinstance -home=/<<PKGBUILDDIR>>/debian/build/connector/testworkingdir/testconfig -no-browser -verbose
1: Error: Unable to launch process: execve failed: No such file or directory
1: 
1:  - Preparing connection ...
1: 
1:  - Error handling in case of insufficient configuration ...
1:  - Connection error: Connection configuration is insufficient.
1: 
1:  - Error handling in case of inavailability ...
1:  - Connection error: Unable to request Syncthing config: Connection refused
1:  - Connection error: Unable to request Syncthing status: Connection refused
1:  - Connection error: Unable to request Syncthing config: Connection refused
[repeats many times]
1:  - Connection error: Unable to request Syncthing status: Connection refused
1:  - Connection error: Unable to request Syncthing config: Connection refused
1:  - Connection error: Unable to request Syncthing status: Connection refused
1: Qt has caught an exception thrown from an event handler. Throwing
1: exceptions from an event handler is not supported in Qt.
1: You must not let any exception whatsoever propagate through Qt code.
1: If that is not possible, in Qt 5 you must at least reimplement
1: QCoreApplication::notify() and catch all exceptions there.

My hypothesis is that the nature of the bug is revealed in the last five lines of this log. 'hope this is enough to identify it, and that it's not too much of a PITA to fix. I believe that the error handling event is sufficiently similar between 1.1.20 and 1.2.0, and that the exception is most likely the true cause of what would superficially appears to be a "UI fails to refresh" bug. Please let me know if further info is required.

Martchus commented 2 years ago

The unsuccessful query for the systemd status and the failing tests are unrelated but both would make sense to look into.

Unfortunately the error message about the exception doesn't tell one the exception's error message or where it was thrown at. So I don't think I can investigate this issue further. (I definitely cannot reproduce it in my build envs but it is likely a real bug happening under some circumstances.)

About the systemd problem: I couldn't reproduce it with the Qt Widgets based version. Maybe some initialization is missing when the Plasmoid is just added to the shell. I'll try to reproduce it this evening. I also requested access to your Google drive, maybe that makes is more clear.

Martchus commented 2 years ago

I cannot reproduce this on my Arch Linux system. I've deleted the config and got rid of the Plasmoid (as standalone Plasmoid and within the system tray), stopped the shell, started it again and added the Plasmoid again. As soon as a typed the final "e" of "syncthing.service" the status went green. As soon as I applied the setting the start/stop button was shown.

The only issue I've noticed is that the enable/disable button's effect is not immediately visible. I'll fix that by reloading the manager.

Martchus commented 2 years ago

Could it be that you haven't reloaded the systemd manager since you've installed Syncthing (and its systemd unit file) itself?

sten0 commented 2 years ago

Martchus @.***> writes:

The unsuccessful query for the systemd status and the failing tests are unrelated but both would make sense to look into.

Unfortunately the error message about the exception doesn't tell one the exception's error message or where it was thrown at. So I don't think I can investigate this issue further. (I definitely cannot reproduce it in my build envs but it is likely a real bug happening under some circumstances.)

The trigger condition for both cases appears to be first-run (no config for either Syncthing or Syncthing Tray), and if syncthing is not running. The log output in both the test and the new-user experience with the plasmoid refers to:

connector/syncthingconnection_requests.cpp 746: emitError(tr("Unable to request Syncthing status: "), SyncthingErrorCategory::OverallConnection, reply);

That's why I hypothesise that both cases have the same cause (ie: no syncthing process is available to connect to).

About the systemd problem: I couldn't reproduce it with the Qt Widgets based version.Maybe some initialization is missing when the Plasmoid is just added to the shell. I'll try to reproduce it this evening. I also requested access to your Google drive, maybe that makes is more clear.

Maybe :) I also tested logging out, then logging in again after the plasmoid had been activated, and the reported (via video) behaviour was identical.

sten0 commented 2 years ago

Martchus @.***> writes:

Could it be that you haven't reloaded the systemd manager since you've installed Syncthing (and its systemd unit file) itself?

I checked for this, and it's not that. Do Arch and Tumbleweed automatically enable (and possibly start) the service (either the system or the --user unit) when Syncthing is installed?

Martchus commented 2 years ago

That's why I hypothesise that both cases have the same cause

In both cases the connection cannot be established. However, the connection to Syncthing's API is completely independent from the systemd code which doesn't even run within the test.

Do Arch and Tumbleweed automatically enable (and possibly start) the service (either the system or the --user unit) when Syncthing is installed?

No. Note that I'm talking about reloading the unit files on disk, not necessarily enabling the unit. Reloading might also not be done automatically on other distros. If it isn't done then it is the user's responsibility to do that. I've been adding a button in Syncthing Tray to reload all unit files.

I also tested logging out, then logging in again after the plasmoid had been activated, and the reported (via video) behaviour was identical.

Ok, then it is not about missing initialization (without restarting the shell).

Martchus commented 2 years ago

I can reproduce the problem when I just create a new unit via cp /usr/lib/systemd/user/syncthing.service /usr/lib/systemd/user/syncthing-test.service and try to configure that one in Syncthing Tray's settings. If the unit file isn't known to systemd yet (as it is new and unit files haven't been reloaded) one cannot fully interact with it. It is actually possible to start/stop and to enable/disable the unit but the effect is not immediately visible. Using the newly added button for reloading unit files helps to get the status.

sten0 commented 2 years ago

Martchus @.***> writes:

I can reproduce the problem when I just create a new unit via cp /usr/lib/systemd/user/syncthing.service /usr/lib/systemd/user/syncthing-test.service and try to configure that one in Syncthing Tray's settings. If the unit file isn't known to systemd yet (as it is new and unit files haven't been reloaded) one cannot fully interact with it. It is actually possible to start/stop and to enable/disable the unit but the effect is not immediately visible. Using the newly added button for reloading unit files helps to get the status.

This example tests the "Syncthing has not been installed properly" case, which includes not having correctly reloading available unit files. To reproduce the case I'm reporting, try the following:

After Syncthing and Syncthing Tray have have been correctly installed, and both the system and user instances have been daemon-reloaded:

  1. systemctl disable syncthing.service
  2. systemctl --user disable syncthing.service
  3. Insure no other syncthing instances are active for other users
  4. Use a freshly created new user account
  5. Log in
  6. Plasmoid should automatically appear in the tray
  7. The error behaviour demonstrated in my video should occur now
sten0 commented 2 years ago

Martchus @.***> writes:

That's why I hypothesise that both cases have the same cause

In both cases the connection cannot be established. However, the connection to Syncthing's API is completely independent from the systemd code which doesn't even run within the test.

So the failed connection to Syncthing, which throws an unhandled exception, is not mucking with the plasmoid's display?

1: Qt has caught an exception thrown from an event handler. Throwing
1: exceptions from an event handler is not supported in Qt.
1: You must not let any exception whatsoever propagate through Qt code.

Ie: The systemd code isn't buggy, and a correctly installed disabled-state service is just a means to trigger the unhandled exception. Likewise, omitting the installation of Syncthing in the test environment can be used to trigger the unhandled exception.

Martchus commented 2 years ago

I could reproduce the systemd integration problem with a disabled and reset unit (so the unit really has no status anymore). I pushed some changes which should help. It'll still show "unknown" as unit status but the file state should be "disabled". Starting/stopping and enabling/disabling such a unit should now work and also be visible immediately.

I can try omitting the installation of Syncthing in the test environment to reproduce the exception.

EDIT: Unfortunately I only get a clean failure:

 - Connection error: Unable to request Syncthing config: Connection refused
 - Connection error: Unable to request Syncthing status: Connection refused
 - Connection error: Unable to request Syncthing config: Connection refused
 - Connection error: Unable to request Syncthing status: Connection refused
 - Connection error: Unable to request Syncthing config: Connection refused
 - Connection error: Unable to request Syncthing status: Connection refused
F......

!!!FAILURES!!!
Test Results:
Run:  7   Failures: 1   Errors: 0

1) test: ConnectionTests::testConnection (F) line: 348 /run/media/martchus/files/programming/projects/c++/cmake/syncthingtray/connector/tests/connectiontests.cpp
forced failure
- unable to connect to Syncthing within 1 min 15 s

Tests failed

Also not with a release build and with QProcess (-DUSE_BOOST_PROCESS:BOOL=OFF).

sten0 commented 2 years ago

I could reproduce the systemd integration problem with a disabled and reset unit (so the unit really has no status anymore). I pushed some changes which should help. It'll still show "unknown" as unit status but the file state should be "disabled". Starting/stopping and enabling/disabling such a unit should now work and also be visible immediately.

Thank you, I'll test in my next chunk of free time and will hopefully report success :)

I can try omitting the installation of Syncthing in the test environment to reproduce the exception.

EDIT: Unfortunately I only get a clean failure:

Sorry for the delay replying, it appears edits to not emit notifications.

 - Connection error: Unable to request Syncthing config: Connection refused
 - Connection error: Unable to request Syncthing status: Connection refused
 - Connection error: Unable to request Syncthing config: Connection refused
 - Connection error: Unable to request Syncthing status: Connection refused
 - Connection error: Unable to request Syncthing config: Connection refused
 - Connection error: Unable to request Syncthing status: Connection refused
F......

!!!FAILURES!!!
Test Results:
Run:  7   Failures: 1   Errors: 0

1) test: ConnectionTests::testConnection (F) line: 348 /run/media/martchus/files/programming/projects/c++/cmake/syncthingtray/connector/tests/connectiontests.cpp
forced failure
- unable to connect to Syncthing within 1 min 15 s

Tests failed

Also not with a release build and with QProcess (-DUSE_BOOST_PROCESS:BOOL=OFF).

Hm, I wonder why you're not able to trigger the unhandled exception... Here's my log: syncthingtray_1.2.1-1_amd64.with-QProcess_no_Boost.build.txt

Martchus commented 2 years ago

Thanks for the log. I suppose you've just disabled use of Boost for testing purposes (as Boost is still installed in the build env). That kind of test is also useful as it already outrules all Boost-using code paths (as the problem still occurs with -DUSE_BOOST_PROCESS:BOOL=OFF). Note that I'd generally recommend to build with Boost.

From the log it looks like the exception is not left unhandled (as the test is even able to fail with the error message "unable to connect to Syncthing within 1 min 15 s"). Therefore there's likely no coredump/stacktrace available (you could still check, just in case). That means it is hard to tell what's wrong.

By the way, if you install Syncthing in your build env (so the tests can launch it), do tests then pass and do you still see the log message about the exception?

sten0 commented 2 years ago

Martchus @.***> writes:

Thanks for the log. I suppose you've just disabled use of Boost for testing purposes (as Boost is still installed in the build env).

You're welcome, and yes, this is what I understood you had requested.

That kind of test is also useful as it already outrules all Boost-using code paths (as the problem still occurs with -DUSE_BOOST_PROCESS:BOOL=OFF).

Agreed, and it's best to test one variable at a time ;) Today I also confirmed the expected identical failure mode when Boost is not installed. For the purposes of debugging, would you like to continue with or without Boost?

Note that I'd generally recommend to build with Boost.

Thank you for confirming this! This was my hunch, and also what was evident in Hannah Rittich's work, so I had supposed the QProcess path was for testing/debugging purposes.

From the log it looks like the exception is not left unhandled (as the test is even able to fail with the error message "unable to connect to Syncthing within 1 min 15 s"). Therefore there's likely no coredump/stacktrace available (you could still check, just in case). That means it is hard to tell what's wrong.

Agreed that this isn't fast and easy to debug. What do you make of lines 4324 to 4334 of syncthingtray_1.2.1-1_amd64.with-QProcess_no_Boost.build.txt? It's quite a bit of work to add debugging to the clean-room build environment, so my questions are: 1. Are the codepaths related to failing to connect to Syncthing sufficiently well instrumented that this effort will be worthwhile? 2. What format coredump/stacktrace/backtrace would you like?

By the way, if you install Syncthing in your build env (so the tests can launch it), do tests then pass and do you still see the log message about the exception?

Yes, tests then pass, and the exception does not occur. This appears to confirm the hypothesis that the exception is (logically) triggered either when the syncthing exec is not found, or when the REST connection fails. And yes, Debian developers usually test these things. Were I not reporting the issue a colleague certainly would ;) After all, the Debian goal is to get everything into a state people can rely on for two years without major bugs or deal-breaker annoyances.

I'll share a fresh log of the success case momentarily.

sten0 commented 2 years ago

syncthingtray_1.2.1-1_amd64.QProcess_no_Boost.with_Syncthing.build.txt

Martchus commented 2 years ago

For the purposes of debugging, would you like to continue with or without Boost?

Keep Boost enabled.

so I had supposed the QProcess path was for testing/debugging purposes.

The code paths for QProcess were simply what I have started with. I only added Boost later to improve handling of subprocess spawned by Syncthing but kept it as optional feature.

I'll look into your other questions later.

sten0 commented 2 years ago

so I had supposed the QProcess path was for testing/debugging purposes.

The code paths for QProcess were simply what I have started with. I only added Boost later to improve handling of subprocess spawned by Syncthing but kept it as optional feature.

I'll look into your other questions later.

Any further insight? My next chunk of free time for serious debugging will be in late September to early October, so there's no rush at all.

Martchus commented 2 years ago

No, I was focusing on the wizard and a few tweaks for the plasmoid after all.

Martchus commented 1 year ago

The test behavior when Syncthing cannot be started should be improved by https://github.com/Martchus/syncthingtray/commit/037c5a309b8d6637416e0a93d01740cf0f3b3c3e.

sten0 commented 1 year ago

Thanks again for your work on this! I'll begin manual testing once the test suite is in a good state--also pending confirmation user config isn't overwritten.

sten0 commented 1 year ago

And here's an update on the tests with a fully updated stack (currently Syncthing Tray 1.3.0 and newest available deps). Trigger condition is when Syncthing is unavailable (in this case when it's not installed in a clean test environment), which is obvious from "Error: Unable to launch process: execve failed: No such file or directory".

test 1
    Start 1: syncthingconnector_run_tests

1: Test command: /<<PKGBUILDDIR>>/debian/build/connector/syncthingconnector_tests "-p" "/<<PKGBUILDDIR>>/connector/testfiles" "-w" "/<<PKGBUILDDIR>>/debian/build/connector/testworkingdir"
1: Working Directory: /<<PKGBUILDDIR>>/debian/build/connector
1: Test timeout computed to be: 10000000
1: /<<PKGBUILDDIR>>/connector/testfiles/
1: /<<PKGBUILDDIR>>/connector/testfiles/
1: ./testfiles/
1: Directory used to store working copies:
1: /<<PKGBUILDDIR>>/debian/build/connector/testworkingdir/
1: Executing test cases ...
1: .
1:  - Setup configuration for Syncthing tests ...
1:  - No timeout factor set, defaulting to 1
1:    (set environment variable SYNCTHING_TEST_TIMEOUT_FACTOR to specify a factor to run tests on a slow machine)
1: 
1:  - Launching Syncthing: syncthing -gui-address=http://127.0.0.1:4001 -gui-apikey=syncthingtestinstance -home=/<<PKGBUILDDIR>>/debian/build/connector/testworkingdir/testconfig -no-browser -verbose
1: Error: Unable to launch process: execve failed: No such file or directory
1: 
1:  - Preparing connection ...
1: 
1:  - Error handling in case of insufficient configuration ...
1:  - Connection error: Connection configuration is insufficient.
1:  - Connection status changed to: disconnected
1: 
1:  - Error handling in case of inavailability ...
1: Qt has caught an exception thrown from an event handler. Throwing
1: exceptions from an event handler is not supported in Qt.
1: You must not let any exception whatsoever propagate through Qt code.
1: If that is not possible, in Qt 5 you must at least reimplement
1: QCoreApplication::notify() and catch all exceptions there.
1: 
1: E.QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-sten'
1: .....
1: 
1: 
1: !!!FAILURES!!!
1: Test Results:
1: Run:  7   Failures: 0   Errors: 1
1: 
1: 
1: 1) test: ConnectionTests::testConnection (E) 
1: uncaught exception of type std::exception (or derived).
1: - Unable to start Syncthing (execve failed: No such file or directory)
1: 
1: 
1: Tests failed
1/3 Test #1: syncthingconnector_run_tests ........***Failed    0.05 sec
test 2
    Start 2: syncthingctl_run_tests

2: Test command: /<<PKGBUILDDIR>>/debian/build/cli/syncthingctl_tests "-p" "/<<PKGBUILDDIR>>/cli/testfiles" "-w" "/<<PKGBUILDDIR>>/debian/build/cli/testworkingdir" "-a" "/<<PKGBUILDDIR>>/debian/build/cli/syncthingctl"
2: Working Directory: /<<PKGBUILDDIR>>/debian/build/cli
2: Test timeout computed to be: 10000000
2: /<<PKGBUILDDIR>>/cli/testfiles/
2: /<<PKGBUILDDIR>>/cli/testfiles/
2: ./testfiles/
2: Directory used to store working copies:
2: /<<PKGBUILDDIR>>/debian/build/cli/testworkingdir/
2: Executing test cases ...
2: .
2:  - Setup configuration for Syncthing tests ...
2:  - No timeout factor set, defaulting to 1
2:    (set environment variable SYNCTHING_TEST_TIMEOUT_FACTOR to specify a factor to run tests on a slow machine)
2: 
2:  - Launching Syncthing: syncthing -gui-address=http://127.0.0.1:4001 -gui-apikey=syncthingtestinstance -home=/<<PKGBUILDDIR>>/debian/build/cli/testworkingdir/testconfig -no-browser -verbose
2: Error: Unable to launch process: execve failed: No such file or directory
2: 
2: Waiting till Syncthing GUI becomes available ...
2: E
2: 
2: 
2: !!!FAILURES!!!
2: Test Results:
2: Run:  1   Failures: 0   Errors: 1
2: 
2: 
2: 1) test: ApplicationTests::test (E) 
2: uncaught exception of type std::exception (or derived).
2: - Unable to start Syncthing (execve failed: No such file or directory)
2: 
2: 
2: Tests failed
2/3 Test #2: syncthingctl_run_tests ..............***Failed    0.02 sec
Martchus commented 1 year ago

Now there's a clear error message so I guess this is as good as it gets. Note that the Qt event handler warning is most likely just due to the unclean shutdown due to the test failure. In the normal setup an exception like this isn't supposed to happen (or would at least be handled before the control flow returns to the event loop) so this isn't a real problem.

Martchus commented 1 year ago

The issue about the systemd integration should be fixed (see https://github.com/Martchus/syncthingtray/issues/144#issuecomment-1174222162).

The issue about test failures when Syncthing isn't installed should be improved as good as it gets (see https://github.com/Martchus/syncthingtray/issues/144#issuecomment-1316089044).

So I guess this issue can be closed.