Martchus / syncthingtray

Tray application and Dolphin/Plasma integration for Syncthing
https://martchus.github.io/syncthingtray/
Other
1.5k stars 42 forks source link

Syncthing Tray sometimes loses connection to Syncthing and fails to connect again on its own #217

Open tomasz1986 opened 7 months ago

tomasz1986 commented 7 months ago

Relevant components

Environment and versions

Bug description

Recently, I've been experiencing this problem where Syncthing Tray just stops connecting to Syncthing after working with no issues for a few days. The problem does not go away on its own unless I manually press the "Apply connection settings and try to reconnect with the currently selected config button". After pressing the button, Syncthing Tray does reconnect to Syncthing properly and works fine for a while again.

These are the error logs. "Połączenie odrzucone" means "Connection refused", and "Nie można zapisać" means "Cannot save". As the logs say, the problem occurred two days ago first, and then today the next time.

[2023-12-08T16:59:03] Unable to request Syncthing status: Połączenie odrzucone
Request URL: https://syncthing:redacted@localhost:8384/rest/system/status
[2023-12-08T16:59:42] Unable to request Syncthing status: Połączenie odrzucone
Request URL: https://syncthing:redacted@localhost:8384/rest/system/status
[2023-12-10T11:34:04] Unable to request Syncthing events: Połączenie zakończone
Request URL: https://syncthing:redacted@localhost:8384/rest/events?since=180267&timeout=60
[2023-12-10T13:46:32] Unable to request Syncthing events: Nie można zapisać
Request URL: https://syncthing:redacted@localhost:8384/rest/events?since=189703&timeout=60

Steps to reproduce

  1. Leave Syncthing Tray running for a few days.

Expected behavior

Syncthing Tray should work and stay connected to Syncthing as long as Syncthing itself is running.

Screenshots

image

Additional context

Syncthing (v1.27.0 as of today) is started on user logon separately from Syncthing Tray. I have also just noticed that the "supply credentials for HTTP authentication" checkbox was ticked and I have now unticked it, however https://docs.syncthing.net/users/config.html#config-option-gui.sendbasicauthprompt is also enabled, so I think Syncthing Tray should work fine both ways (and it does connect using just the username and password initially).

Martchus commented 7 months ago

Considering the log the retry logic is definitely working as it tries to reconnect. Considering clicking on "Apply connection settings and try to reconnect with the currently selected config button" helps it also cannot be Qt's network module.

So it must be the state of Syncthing Tray's connection handling. On the other hand, this would also be strange because it says "Connection refused" indicating that Syncthing is not even reachable at all.

So I cannot really make sense of the problem right now. That it is only reproducible after a few days makes it of course even harder to debug.

Is that a new problem? I actually haven't changed much recently when it comes to the handling of API requests and events.

Note the you only need user name and password if Syncthing is behind a reverse proxy requires basic HTTP auth. For Syncthing itself the API key should be sufficient.

tomasz1986 commented 7 months ago

Is that a new problem? I actually haven't changed much recently when it comes to the handling of API requests and events.

This actually does look new to me. I don't remember seeing Syncthing Tray disconnecting and disabling itself before. I've only noticed it in the last few days because the icon turned grey. Just for the record, I've noticed the problem on two separate Windows devices so far.

Note the you only need user name and password if Syncthing is behind a reverse proxy requires basic HTTP auth. For Syncthing itself the API key should be sufficient.

Yeah, honestly I don't remember why I used them both. This is an old config, it used to be set like that for a very long time.

tomasz1986 commented 7 months ago

The problem happened again, this time on another device.

[2023-12-03T15:22:50] Unable to request Syncthing events: Temporary network failure.
Request URL: https://xxx.local:8384/rest/events?since=240802&timeout=60
[2023-12-03T17:35:49] Unable to request connections: Połączenie odrzucone
Request URL: https://xxx.local:8384/rest/system/connections
[2023-12-03T17:35:56] Unable to request errors: Połączenie odrzucone
Request URL: https://xxx.local:8384/rest/system/error
[2023-12-03T17:36:26] Unable to request device statistics: Połączenie odrzucone
Request URL: https://xxx.local:8384/rest/stats/device

Not sure if relevant but I can add that I use hostname URLs to connect Syncthing Tray to the devices. In addition, today the issue happened right when the device lost its LAN connection. As soon as the LAN connection was lost, Syncthing Tray also lost its connection to Syncthing and changed the icon colour to grey. However it stayed like that even long after the device itself re-connected to the LAN. Like before, a manual intervention was required to bring it back to life.

Martchus commented 7 months ago

In addition, today the issue happened right when the device lost its LAN connection. As soon as the LAN connection was lost, Syncthing Tray also lost its connection to Syncthing and changed the icon colour to grey.

This speaks for a bug in Qt's network stack. Maybe a regression in Qt 6.6.1?

Like before, a manual intervention was required to bring it back to life.

But this on the other hand speaks for something gone stale in Syncthing Tray's internal code. I haven't changed anything except for the introduction of timeouts (which are just an additional function call before setting up the connection). But maybe configuring a transfer timeout (or maybe configuring both timeouts) makes it actually worse in that regard nevertheless. I personally have mainly used the long-polling timeout but not the transfer timeout (except in my initial testing which did only happen under GNU/Linux).

Note that I haven't been able to reproduce the issue under Windows yet but I guess my longest session was only around 6 hours. I actually disconnected and re-connected the Wifi a lot today (which should be similar to losing the LAN connection) and also couldn't reproduce the issue that way. On GNU/Linux I had longer sessions but couldn't reproduce the issue as well.

tomasz1986 commented 6 months ago

This speaks for a bug in Qt's network stack. Maybe a regression in Qt 6.6.1?

I have just noticed that the other device where the issue occurred still ran Syncthing Tray 1.4.9 with Qt 6.6.0, so the culprit must be something else.

tomasz1986 commented 6 months ago

I'm now seeing the same issue with yet another device, which is an Android phone running Syncthing, with Syncthing Tray connected to it from a Windows device. This time, the device just keeps disconnecting after 1-2 minutes for no apparent reason. Each time this happens, I can reconnect manually with no problems.

Both the transfer timeout and long polling interval are set to 60000 ms. If I reset the first one to the default value of "no timeout", the device stops disconnecting. Premature conclusion. It still loses connection even with "no timeout". I'm now testing with the long polling interval also reset to its default value of "Syncthing's default with no timeout".

Edit: It seems to stay connected with the long polling interval set to "Syncthing's default with no timeout"!

Martchus commented 6 months ago

Ok, so the timeout for the long polling interval avoids the connection from becoming stuck (issue #209) but leads to this issue. Maybe it wasn't the best idea to make it now the default - although I couldn't reproduce the issue myself so far. I'm nevertheless still wondering why this happens. If the connection would run into a timeout it should not say "Connection refused".

stale[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tomasz1986 commented 4 months ago

I'm still experiencing the issue on my devices, and I'd like to get rid of it now, but I'm a bit confused about the settings.

image

This is what I currently have on most devices. Do you recommend that I set just the Long polling int. to "Syncthing's default with no timeout" and leave the Transfer timeout at 60000 ms?

Martchus commented 4 months ago

I'm not sure what to recommend. I cannot reproduce the issue myself so it is not easy to improve anything on my side. I cannot even bisect what change caused it.

I thought that setting timeouts would be generally a good idea to avoid the connection from getting stuck. However, that might have caused a regression so I reverted enabling timeouts by default again (see https://github.com/Martchus/syncthingtray/commit/699dcbdcacf5921091a5aa99d5b4e09e7a126e1b). Disabling the long polling interval/timeout is maybe the safest option. If you do that then you might see https://github.com/Martchus/syncthingtray/issues/209 again, though.

xgdgsc commented 4 months ago

Usually on a low performance device (arm in my case). After resume from suspend there would be a short period when CPU usage is 100% by all the busy processes. And it would show the connection lost error.

Martchus commented 4 months ago

And it would show the connection lost error.

That may be expected - unless you have configured a long enough grace period for that alert or unless you have disabled the alert completely. (The grace period is the number of seconds you can configure in the notifications settings.)

This ticket is about re-connects not happening (independently of the alert I assume) despite a re-connect interval being configured for the relevant connection in the connection settings.

tomasz1986 commented 4 months ago

Disabling the long polling interval/timeout is maybe the safest option. If you do that then you might see #209 again, though.

Yeah, I see the dilemma here. Not sure what to do then. Maybe enable the long polling interval only on the devices that use hibernation?

What about the transfer timeout though? Is it better to disable or keep it at 60000 ms?

Martchus commented 4 months ago

Yeah, I see the dilemma here. Not sure what to do then. Maybe enable the long polling interval only on the devices that use hibernation?

@xgdgsc Now that @tomasz1986 used the word "hibernation" I get the problem you are having. I guess it is true that on hibernation or on standby (or whatever causes all network connections to break) one sees the "Connection lost error" because, well, the connection was in fact lost. For GNU/Linux I actually implemented suppressing those alerts as part of the systemd integration but it hasn't been done yet for Windows. So if you use hibernation/standby very often and are annoyed by the alerts you'll have to disable them completely. Note that this still has nothing to do with your issue where the re-connect doesn't work for some reason (despite being configured). Also note that it could still be that you generally need a higher grace period for this alert on slower devices.

What about the transfer timeout though? Is it better to disable or keep it at 60000 ms?

It is best to keep the default which means disabling it because some HTTP requests can take very long and there's so far no exception for them. (For example, the request for rescanning a folder only completes after rescanning is complete. This is simply how Syncthing's REST-API behaves and it also makes kind of sense.) This setting also shouldn't affect whether Syncthing Tray considers itself connected or not (because that's done via the long polling connection). By the way, if you are in the state where the connection is lost but not recovered, can you make other requests like requesting a rescan?

ProactiveServices commented 3 months ago

I've just had this problem recur and noticed that syncthing wasn't running this time. It seems that it auto-upgraded and because syncthingtray starts syncthing with the --no-restart argument (both syncthing.exe instances) this prevents syncthing from automatically restart itself after an upgrade. I checked my configuration, and that the envvar STNOUPGRADE is unset. This behaviour could be one cause of this bug - it happens much less frequently these days.

https://docs.syncthing.net/users/syncthing.html#cmdoption-no-restart

Martchus commented 3 months ago

That would be one way to run into this issue. I don't remember why I added --no-restart to the default arguments. Maybe it makes sense to remove it from the defaults.

However, considering the initial ticket description, especially the part

unless I manually press the "Apply connection settings and try to reconnect with the currently selected config button". After pressing the button, Syncthing Tray does reconnect to Syncthing properly and works fine for a while again.

I don't think this is what caused the issue in @tomasz1986 case. If Syncthing doesn't run anymore then clickng that button wouldn't make a difference, too.

tomasz1986 commented 3 months ago

Yeah, in my case, Syncthing was still running in the background. I also use my own builds with upgrades disabled, so there is no automatic upgrade and restart business going on either.

stale[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.