Martchus / syncthingtray

Tray application and Dolphin/Plasma integration for Syncthing
https://martchus.github.io/syncthingtray/
Other
1.54k stars 43 forks source link

Syncthing Tray stops updating (stalls) after resuming Windows from hibernation #209

Closed tomasz1986 closed 6 months ago

tomasz1986 commented 9 months ago

Relevant components

Environment and versions

Bug description

It seems that after hibernating the OS and then resuming from hibernation later, Syncthing Tray stalls and stops updating the Syncthing state. In other words, it appears stuck in the last state from before the OS was hibernated.

Steps to reproduce

  1. Run Syncthing Tray connected to a Syncthing instance.
  2. Hibernate Windows.
  3. After a while, resume the OS from hibernation.

Expected behavior

Syncthing Tray should refresh the Syncthing state after resuming the OS from hibernation following the poll intervals set in the configuration.

Screenshots

image

Additional context

I have experienced the problem on two unrelated Windows devices.

Martchus commented 9 months ago

Ok, so basically the automatic reconnect that is set to 30 seconds doesn't work. Does it connect if you do it manually or does it appear as if no network traffic is possible anymore at all? Can Syncthing's web UI be generally accessed?

Martchus commented 9 months ago

Could it be related to this Qt bug that's already mentioned in the README?

The tray disconnects from the local instance when the network connection goes down. The network connection must be restored or the tray restarted to be able to connect to local Syncthing again. This is caused by Qt bug https://bugreports.qt.io/browse/QTBUG-60949.

tomasz1986 commented 9 months ago

Does it connect if you do it manually or does it appear as if no network traffic is possible anymore at all?

It does reconnect and resume working if I right click the Syncthing Tray icon, click "Connection" and then the specific device.

Can Syncthing's web UI be generally accessed?

No problem accessing the Web GUI after resuming from hibernation.

Could it be related to this Qt bug that's already mentioned in the README?

Not sure. The network connection itself is restored by the OS after resuming from hibernation and there are no problems connecting to the Internet in general.

Martchus commented 9 months ago

It does reconnect and resume working if I right click the Syncthing Tray icon, click "Connection" and then the specific device.

Ok, that means it is not the mentioned Qt bug.

No problem accessing the Web GUI after resuming from hibernation.

Good, this must be a bug in the retry logic then.


How do you start Syncthing? What version of Syncthing do you use? Do you have the feature to use the status of the internal launcher enabled?

tomasz1986 commented 9 months ago

How do you start Syncthing? What version of Syncthing do you use? Do you have the feature to use the status of the internal launcher enabled?

It's the current v1.25.0 (custom built with a few tweaks). I start the syncthing.exe executable with the Task Scheduler (more or less using the method from https://docs.syncthing.net/users/autostart.html#autostart-windows-taskschd).

Do you have the feature to use the status of the internal launcher enabled?

I'm not sure I understand the question, but I don't use the internal launcher regardless.

Martchus commented 9 months ago

Ok, I've just tried the latest release of Syncthing Tray on Windows 10 with Syncthing v1.25.0 myself. I couldn't reproduce the problem. Is it always reproducible in your case?

The question is whether you have the 3rd checkbox in the launcher settings enabled. When not using the built-in launcher it should not make a difference, though. (But there might be a bug with it causing it to make a difference when it shouldn't so you might try to turn it off.)

I could actually not even reproduce a connection loss (so no re-connect was required). Considering that in your case it also doesn't complain about a connection loss (right?) this might not be the re-connect being broken (because without connection loss the re-connect will not even trigger). So maybe the real question is why it just gets stuck in your case instead of either receiving further data or losing the connection (and then re-establishing it).

Maybe there should be a timeout. There actually is a configurable timeout for normal requests but this is about the long-polling events where this setting does not apply (because it wasn't very practicle as one could easily break things by setting it to less than the long polling interval). I guess it would make sense to set the timeout for long-polling events to be always a little bit higher than the interval enforced by Syncthing (but then one needed presumably use the timeout parameter on every request to control the interval enforced by Syncthing).

tomasz1986 commented 9 months ago

Ok, I've just tried the latest release of Syncthing Tray on Windows 10 with Syncthing v1.25.0 myself. I couldn't reproduce the problem. Is it always reproducible in your case?

I will need to do more testing to verify whether the issue happens 100% of the time but at the moment it seems to always happen after resuming from hibernation. This is true for two separate devices.

The question is whether you have the 3rd checkbox in the launcher settings enabled. When not using the built-in launcher it should not make a difference, though. (But there might be a bug with it causing it to make a difference when it shouldn't so you might try to turn it off.)

These are the launcher-related settings.

image

I could actually not even reproduce a connection loss (so no re-connect was required). Considering that in your case it also doesn't complain about a connection loss (right?) this might not be the re-connect being broken (because without connection loss the re-connect will not even trigger). So maybe the real question is why it just gets stuck in your case instead of either receiving further data or losing the connection (and then re-establishing it).

Yeah, it just keeps showing Syncthing as connected with the last state from before the hibernation. There are no error messages or any other notifications whatsoever.

Martchus commented 8 months ago

I implemented a way to configure the Long-Polling-Timeout. The change is on the master branch but there is no timeout enforced by default yet¹. I'm wondering whether configuring a request timeout and a long-polling interval/timeout will help in your case so you might want to give that new options a try. Can you build Syncthing Tray on your own to test that?

Note that the number of options has grown a little bit too long. So the timeout settings I'm talking about are now hidden by default and only shown after ticking "Show advanced configuration".


Maybe it makes sense to enable sensible timeouts by default. At least the long-polling timeout would make sense. For API requests I would likely need to distinguish between the particular requests, e.g. requesting a rescan would actually only return until the scan is complete which might take a while (so maybe this request should be exempt from the timeout setting and maybe there are more requests of that kind).

Martchus commented 8 months ago

The latest release contains the mentioned config options (still disabled by default). So you can test it without making your own build.

tomasz1986 commented 8 months ago

I've only now found time to come back to the the issue. There is a problem though. I have just updated Syncthing Tray to the newest version. Right before the upgrade, the previous version (i.e. v1.4.7) was running fine. However, after the upgrade, the current version (i.e. v1.4.8) fails to connect to Syncthing at all. If I try to force a connection, it just briefly changes to the "reconnecting" state, and then goes back to "disconnected".

Martchus commented 8 months ago

Have you by any chance also updated Syncthing itself? There were some changes regarding authentication. What is the error you're getting? Without any further details I'm afraid I can't help. (Right-click on the icon should give you an option to view errors. Otherwise, try to check stdout/stderr via the GUI wrapper using debugging environment variables mentioned in the README.)

tomasz1986 commented 8 months ago

Syncthing actually wasn't upgraded on that device, meaning that it was still running v1.25.0. That configuration worked with Syncthing Tray v1.4.7 but doesn't with Syncthing Tray v1.4.8. I will have access to the machine again tomorrow, so I will try to do more testing then (and hopefully provide the logs).

On the other hand, on other devices, Syncthing is the newest v1.26.0 and I can connect to it using both Syncthing Tray v1.4.7 and v1.4.8.

Martchus commented 8 months ago

And definitely make sure to have the API key in the configuration. As of v1.26.0 Syncthing no longer supports basic-auth. (I was just reminded after reading https://github.com/syncthing/syncthing/issues/9208.)

tomasz1986 commented 8 months ago

The log is flooded with these error messages:

Request URL: https://localhost:8384/rest/system/status
[2023-11-17T16:20:21] Unable to request Syncthing status: The credentials were not recognized / Invalid argument
Request URL: https://localhost:8384/rest/system/status

I've updated Syncthing to v1.26.0 now but there is no difference. Neither v1.25.0 nor v1.26.0 work.

image

Not really sure why this is happening because the exact same configuration used to work perfectly until the very recent past.

Martchus commented 8 months ago

Looks like the error comes from Qt, in particular from qtbase/src/plugins/tls/schannel/qtls_schannel.cpp when handling the SEC_E_UNKNOWN_CREDENTIALS case. This is the TLS backend of Qt where it makes use of Schannel (https://learn.microsoft.com/en-US/windows-server/security/tls/tls-ssl-schannel-ssp-overview). So I guess the error would go away if you used http instead of https (and of course also disabling https in Syncthing's own settings).

I found also https://forum.qt.io/topic/151961/qt6-6-network-get-the-credentials-were-not-recognized which makes it look like this is due to the lack of bundling OpenSSL. However, that would only be a workaround. My Qt 6 based build is supposed to use Schannel instead of OpenSSL for the sake of bundling less security-relevant code.

Note that I cannot reproduce the problem. I've just enabled https in my local Syncthing instance here and the latest Windows build can still access it using a configuration very similar to yours. The only difference is that I used an absolute path to specify the certificate and that I've only tested under GNU/Linux with WINE.

To me this looks like a bug in Qt's Schannel code which has been introduced somewhere between Qt 6.5.3 and 6.6.0¹ that is not triggering in all environments/cases. Before having a clearly defined reproducer it is likely not very useful to file a Qt bug.


¹The Syncthing Tray v1.4.7 build uses Qt 6.5.3 and the v1.4.8 build Qt 6.6.0. The Qt version in the ticket description is misleading as we're now actually discussing a different issue.

tomasz1986 commented 8 months ago

It works after disabling HTTPS in Syncthing and changing the URL to HTTP! I think the problem may be related to the fact that the PC runs Windows 10 Enterprise 2016 LTSC which is quite old now, although fully updated. The PC is scheduled for an upgrade to 2021 LTSC soon, so if this is indeed the culprit, I will be able to verify it after the upgrade.

¹The Syncthing Tray v1.4.7 build uses Qt 6.5.3 and the v1.4.8 build Qt 6.6.0. The Qt version in the ticket description is misleading as we're now actually discussing a different issue.

Yeah, the HTTPS thing is more like a side issue, not really related to the main problem which is about hibernation and such. Actually, coming back to the main issue, what values specifically would you suggest to put into the "Transfer timeout" and "Long polling int." fields?

Martchus commented 8 months ago

So probably one of the commits in git log v6.5.3..v6.6.0 -- src/plugins/tls/schannel/qtls_schannel.cpp broke it.

Maybe it is https://github.com/qt/qtbase/commit/ada2c573c1a25f8d96577734968fe317ddfa292a. However, if that would be the case it means that it only worked before when it should not have and maybe just the certificate path is wrong. (Not sure what's used as a base if you specify a relative paths. Try with an absolute path to cross-check.)

Or it is https://github.com/qt/qtbase/commit/a7d92f809f3d05a22c38ec6f77f9c62190d2deb0. Then an upgrade of Windows 10 might help, indeed. I guess I should clarify the minimum Windows 10 version in the README. This possibly also fixes other bugs because Qt devs really like to remove code paths for older Windows versions.

Martchus commented 8 months ago

Actually, coming back to the main issue, what values specifically would you suggest to put into the "Transfer timeout" and "Long polling int." fields?

For the long-polling interval I suggest to put in 60 seconds. That's actually Syncthing's default but if you put it in explicitly than Syncthing Tray will also enforce a client-side timeout. Maybe that already helps. If not, try transfer timeout. You can also go for a minute for testing purposes. However, in practice there's likely be a longer timeout required because otherwise you'd probably run into the timeout when triggering scans. (Treating scans and possible other long requests differently hasn't been implemented yet.)

tomasz1986 commented 8 months ago

Changing the certificate path doesn't help (and it used to work with the exact same config before, so…). The second commit looks like the culprit for sure though.

I've just set the long-polling interval to 60 seconds. I'm going to put the machine into hibernation later today and then come back to it on Monday, so then I should be able to verify whether it has helped or not 🙂.

Martchus commented 8 months ago

And, did it work?

I've just released a new version which checks the Windows version and shows a warning if it is older than the oldest Windows version supported by Qt 6. You may want to test whether it works if you haven't updated your machine yet. (I don't have such an old system and only tested it via WINE.)

tomasz1986 commented 8 months ago

I still haven't been able to check that device, unfortunately. However, I can confirm that with the two new values set to 60,000 ms, the hibernation stall is gone on yet another device which was experiencing the very same issue 🙂.

stale[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.