Martchus / syncthingtray

Tray application and Dolphin/Plasma integration for Syncthing
https://martchus.github.io/syncthingtray/
Other
1.68k stars 44 forks source link

syncthingtray folder/device status out of sync with web #93

Closed buck2202 closed 3 years ago

buck2202 commented 3 years ago

Relevant components

Environment and versions

Bug description Machine/share statuses are frequently out of sync with those shown in the web interface.

Steps to reproduce The two most consistent reproducers for me are

  1. Trigger a rescan on a "large" share that will take some time to complete
  2. Start/restart syncthing tray
  3. The share status will show unknown until the scan completes

or

  1. Start syncthing tray when everything is up to date and idle
  2. Share statuses show unknown until an event occurs (rescan, syncing, etc)

From those startup-related observations, it seemed like syncthing tray only captures folder/device state when it changes and is correct after that, but that doesn't seem to be the case. The tray state has been diverging from the web interface for me on multiple machines with some regularity (two following the "Building this straight" instructions, and one from the appimage)

Expected behavior Directory/device statuses in tray would match folder/device statuses in the web interface

Screenshots Share state mismatch (scanning vs unknown): Screenshot from 2021-05-17 11-55-01

Share state mismatch (scanning vs idle): Screenshot from 2021-05-17 11-56-50

Multiple machine mismatches (one disconnected instead of up to date, one disconnected instead of syncing, one disconnected with full device ID instead of up to date with "friendly" name) Screenshot from 2021-05-17 11-57-21

Additional context It's unclear to me if this is an issue specific to my environment or a limitation of syncthing's provided APIs that my large/slow-to-scan folders are exposing. I had similar problems with syncthing-gtk (which is why I decided to try syncthing tray), but saw that its development has mostly stopped, and hoped that an active project with recent development might have found a way to avoid this problem

Martchus commented 3 years ago

It cannot be a limitation of the API because the official GUI just uses that as well. So there's indeed room for improvement.

However, I must say that I haven't noticed that problem so far. Maybe it really only becomes apparent with huge/slow folders or a recent change within Syncthing causes this.

Note that Syncthing's official GUI also needs a while to display the status in my experience and so far I was actually under the impression that Syncthing Tray is faster.

buck2202 commented 3 years ago

hm, is there any info I can provide that might help track this down? on my machines it's quite repeatable, and the mismatches can last for a long time (my second set of screenshots came from a machine that's had syncthing and syncthing tray running uninterrupted for ~3 days. I think tray/web started mismatching sometime yesterday afternoon, and it hasn't corrected itself yet).

edit to add: I do have a machine 3 or 4 days into its (ongoing) initial sync of my largest folder, so if there's some issue with slow operations, it's definitely possible that I'm seeing it exaggerated right now (though I did observe syncthing-gtk being out of sync with the web interface over years of use before I decided to try your tool)

Martchus commented 3 years ago

I've checked the code again and Syncthing Tray is actually querying the directory state actively so it should not be required to observe a change until the state shows up. This also work in my setup.

I could find a bug where the state isn't assigned correctly because timezones (of the event timestamps) were used inconsistently. It also wouldn't assign the state if the related timestamp cannot be parsed at all. These problems should be fixed now.

I also improved the debug logging. You could now set the environment variable LIB_SYNCTHING_CONNECTOR_LOG_ALL=1 to log everything which is going on. This log would be useful but take care to redact any secrets before uploading logs here.

I'm also wondering whether syncthingctl status shows the status correctly in your case.

buck2202 commented 3 years ago

My long-running initial sync has completed, and my folders are all currently idle. I don't have a folder-state mismatch to examine at the moment. I do have a machine state mismatch, though, so just wanted to check in with syncthingctl status output before I rebuild and upgrade from release 1.1.6.

This is the machine from yesterday that showed 1x folder mismatch and 3x machine mismatches. Syncthing has "caught up" to the folder state that syncthing tray was already displaying ("idle")--not sure if tray is truly correct here, or if it's still locked in the state it had yesterday and is just coincidentally correct now. But, the machine mismatches remain-- Screenshot from 2021-05-18 13-22-24

syncthingctl shows the device status correctly (stripped out IDs and IP addresses as some are public)

$ syncthingctl status --all-devs | grep -v ID | grep -v 'Connection address'
Overall statistics
   Status                        idle
      Global                        2464472 file(s), 584073 dir(s), 718.05 GiB
      Local                         2464472 file(s), 584073 dir(s), 718.05 GiB
   Incoming traffic              241.81 GiB
   Outgoing traffic              318.33 GiB
   Connected to                  4 devices:
                                    buckma5
                                    EGVSRV35
                                    EGHPDT20
                                    buckma
   Uptime                        0 s
   Version                       syncthing v1.15.1 "Fermium Flea" (go1.16.3 linux-amd64) deb@build.syncthing.net 2021-04-06 08:42:29 UTC [noupgrade]

Devices
 - buckma4
   Status                        own device
   Addresses                     dynamic
   Compression                   metadata

 - EGHPDT20
   Status                        idle
   Addresses                     dynamic
   Compression                   metadata
   Connection type               tcp-server
   Client version                v1.16.1
   Incoming traffic              10.92 KiB
   Outgoing traffic              10.42 KiB

 - EGVSRV35
   Status                        idle
   Addresses                     dynamic
   Compression                   metadata
   Connection type               tcp-server
   Client version                v1.16.1
   Incoming traffic              66.21 MiB
   Outgoing traffic              101.04 MiB

 - buckma
   Status                        idle
   Addresses                     dynamic
   Compression                   metadata
   Connection type               tcp-server
   Client version                v1.15.1
   Incoming traffic              13.39 GiB
   Outgoing traffic              5.35 GiB

 - buckma5
   Status                        idle
   Addresses                     dynamic
   Compression                   metadata
   Connection type               tcp-server
   Client version                v1.16.1
   Incoming traffic              1.76 GiB
   Outgoing traffic              72.68 GiB

I'll check back in after I've had a chance to rebuild and watch for errors

edit: actually, I just noticed tray showing a "connect" icon at the top. Is that an indication that syncthing tray has lost its connection to syncthing, or is it offering to "unpause" devices in syncthing? I clicked that, and the machine states corrected themselves in tray.

In any event, I'm sure there's still more going on here--for the first folder-mismatch screenshot yesterday, syncthing tray showed a "pause" button at top, not "connect"

Martchus commented 3 years ago

Hard to tell what's going on but I can answer your question. If the button offers to connect that indicates Syncthing Tray lost its connection to Syncthing (or could never really connect). Clicking the button in this states makes it try to connect again. If the button offers to pause/resume it means the connection to Syncthing is fine and it is possible to pause/resume all devices via the button.

It looks like syncthingctl shows the device states correctly so I guess the backend library kind of works and can at least determine the state initially. It is weird that it cannot show the uptime - as if timestamp could not be parsed. With the latest version this should actually result in warnings on strerr.

buck2202 commented 3 years ago

Sorry for the delay--I had to stop syncthing for a few days because its disk access was bogging down my work. I've updated, and get a lot of warnings like this

Warning: Unable to parse last scan timestamp "2021-05-20T23:02:45-04:00": unexpected "-" after day
Warning: Unable to parse last seen timestamp "2021-05-21T00:30:53-04:00": unexpected "-" after day
Warning: Unable to parse folder state timestamp "2021-05-21T00:26:29-04:00": unexpected "-" after day

in gnu date format, my system is returning %Y-%m-%dT%H:%M:%S%:z, e.g.

$ date +%Y-%m-%dT%H:%M:%S%:z
2021-05-21T00:35:22-04:00

The first "-" after the day is part of the time zone (I'm currently UTC-4)...is that what it's complaining about? I'm not sure where the date format is coming from (system or syncthing), or if mine is non-standard.

It was previously really easy for me to reproduce the folder states being out of sync, but my simple reproducer from the initial post isn't immediately causing the issue anymore. I'll keep an eye on it, but wanted to at least respond about the timestamps.

edit: aside from the update/rebuild of syncthingtray, the multi-day initial sync to a new machine that was active during my initial report has finished. I'm not sure if that specific share/machine might somehow have been the cause of my larger issue--e.g. is there a never-complete flag somewhere or a 'last completed' timestamp? Something that could be throwing a null or unexpected value if a folder+device have never been up to date? Just speculating. I have syncthingtray logging and I'll keep watching.

Martchus commented 3 years ago

Oh, I guess my timestamp parser is broken for negative timezone offsets. I'm going to fix this of course. Then it is clear why it couldn't get the initial state. However, the new version might show the problem only in fewer cases then.

About the resource usage, maybe the following is useful: https://man.archlinux.org/man/systemd.resource-control.5.en#UNIFIED_AND_LEGACY_CONTROL_GROUP_HIERARCHIES

Martchus commented 3 years ago

Parsing the timestamp should be fixed now. Note that the relevant change is within the underlying library c++utilities (see https://github.com/Martchus/cpp-utilities/commit/546b1fecb70fa34b6f7cb9352a9f8a5da4964312).

Apparently it was always broken if a timestamp had a negative timezone offset and the seconds had no fractional part. I've never been testing this combination.

buck2202 commented 3 years ago

I've rebuilt and am logging on two machines. I'm reprocessing a 120GB set of ~1.2 million files right now, so I'm in a perpetual state of "scanning", but can at least confirm I'm no longer seeing the timestamp warnings.

I'll check back in a few days to let you know if I've noticed any issues. Thanks for the attention!

buck2202 commented 3 years ago

hi there, I've seen another out-of-sync issue, but had trouble documenting syncthingtray's state. I wasn't seeing this before, but, when expanded, tray seems to block my desktop manager (cinnamon) from noticing any key/mouse events (e.g. print screen, mouse scroll in a visible but inactive window, hover in taskbar, etc). I've had to manually kill a few cinnamon processes a few times recently because of runaway memory use, so I might just need a clean boot.

But in my confusion about not being able to grab a screenshot, I restarted tray (which overwrote the log because I only >'d it), and when it reloaded it had corrected itself. Just wanted to ping to let you know this doesn't seem quite ready to close yet.

Martchus commented 3 years ago

Under X11 one needs to grab mouse events for showing a context menu so Syncthing Tray (or better Qt) does that. Of course if it gets suck you'll end up in the situation you described. Note that this problem is fixed under Wayland (but there are unfortunately other issues under Wayland instead).

Let's see whether you can still reproduce the problem. Please be sure to use the latest version (also of c++utilities) as I've been adding a few tweaks, e.g. timestamp parsing errors should now show up within the UI like other parsing errors.

buck2202 commented 3 years ago

Ah, now that you mention it, I think my prior screenshots were taken of a remote-desktop display of the machine having issues. This time I was trying to do it directly on the relevant machine. Likely not a new issue after all, then.

I was fully up to date as of the 22nd, but I'll be sure to update everything again.

Martchus commented 3 years ago

Does it work now with the fixed timestamp parsing?

buck2202 commented 3 years ago

I haven't noticed any persistent issues since your last fix.

I haven't done anything as demanding as when I first saw it (a huge, slow, initial sync), and don't know what circumstance produced it on May 28. I'd say it's ok to close this...if I ever see it again (and can actually document it), I'll check back in.

Thanks!

Martchus commented 3 years ago

Ok, then I close it for now.