2e3s / awatcher

Activity and idle watchers
Mozilla Public License 2.0
126 stars 4 forks source link

Stops working randomly #12

Open peyloride opened 7 months ago

peyloride commented 7 months ago

Running Archlinux + Hyprland.

I'm using bundled awatcher and starting it like:

exec-once = telegram-desktop & awatcher

It starts well and runs at first, then randomly it crashes. I don't have a log or something unfortunately, if you can guide me how to get it I can provide that.

Example: image

as you can see I was not AFK and was on meet but all of a sudden awatcher stops recording. Weird thing is the server works all the time, I can access localhost:5600 all the time.

This is an example of random day, it happens almost every day.

2e3s commented 7 months ago

Hello! Maybe try to run this manually with logging awatcher -vvv and see if it eventually fails as well? Hunting for logs from exec-once may be unnecessarily challenging.

If it works with a manual run and the problem exists only within exec-once, show the lines from wherever exec-once gets it posted:

exec-once = awatcher -vvv

Your system is probably using systemd, so this might work: journalctl --user --no-pager --since "1 day ago" | grep awatcher or this sudo grep -i awatcher /var/log/syslog If it doesn't post logs in your setup, you can make it a service like described in https://github.com/2e3s/awatcher/issues/11

as you can see I was not AFK and was on meet but all of a sudden awatcher stops recording

It posted "afk" which means the program didn't crash. So there was a problem getting the window data, maybe Hyprland didn't return it in some conditions. -vvv has enough level of debug output to understand what's going on, I guess.

jopereira commented 6 months ago

A also see this happening, with both standalone and bundled versions. I noticed the following:

[2024-02-12 08:28:02.469725 DEBUG watchers::watchers::gnome_window] Changed window app_id=...

[2024-02-12 09:15:17.336292 WARN aw_transform::flood] Gap was of negative duration and could NOT be safely merged (-PT183.376476271Ss). This warning will only show once per batch. [2024-02-12 09:15:17.336310 WARN aw_transform::flood] Gap was of negative duration (-PT388.971761192Ss), but could be safely merged. This error will only show once per batch.

luisgerhorst commented 5 months ago

I can confirm what @jopereira reports:

$ journalctl --user -u aw-awatcher
<snip>
Mar 13 08:58:30 luis-tp bash[4290]: [2024-03-13 07:58:30.344889 DEBUG watchers::watchers::gnome_window] Changed window app_id="Emacs", title="..."
Mar 13 08:58:30 luis-tp bash[4290]: [2024-03-13 07:58:30.344931 TRACE watchers::report_client] Reporting app_id: Emacs, title: ...
Mar 13 08:58:30 luis-tp bash[4290]: [2024-03-13 07:58:30.677704 TRACE watchers::watchers::idle] Reporting as not idle
Mar 13 08:58:31 luis-tp bash[4290]: [2024-03-13 07:58:31.375662 TRACE watchers::report_client] Reporting app_id: Emacs, title: ...
Mar 13 08:58:32 luis-tp bash[4290]: [2024-03-13 07:58:32.416422 TRACE watchers::report_client] Reporting app_id: Emacs, title: ...
Mar 13 08:58:33 luis-tp bash[4290]: [2024-03-13 07:58:33.459502 TRACE watchers::report_client] Reporting app_id: Emacs, title: ...
Mar 13 08:58:35 luis-tp bash[4290]: [2024-03-13 07:58:35.700555 TRACE watchers::watchers::idle] Reporting as not idle
Mar 13 08:58:40 luis-tp bash[4290]: [2024-03-13 07:58:40.720520 TRACE watchers::watchers::idle] Reporting as not idle
Mar 13 08:58:45 luis-tp bash[4290]: [2024-03-13 07:58:45.743974 TRACE watchers::watchers::idle] Reporting as not idle
<snip>
$ journalctl --user -u aw-server
<snip>
Mar 12 17:56:00 luis-tp bash[4289]: 2024-03-12 17:56:00 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-emacs_luis-tp)  (aw_server.api:279)
Mar 12 18:14:46 luis-tp bash[4289]: 2024-03-12 18:14:46 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-window_luis-tp)  (aw_server.api:279)
Mar 12 18:14:51 luis-tp bash[4289]: 2024-03-12 18:14:51 [WARNING]: Gap was of negative duration and could NOT be safely merged (-181.052s). This warning will only show once per batch.  (aw_transform.flood:54)
Mar 12 18:14:51 luis-tp bash[4289]: 2024-03-12 18:14:51 [WARNING]: Gap was of negative duration but could be safely merged (-1204.95s). This message will only show once per batch.  (aw_transform.flood:46)
<snip>
Mar 12 18:15:16 luis-tp bash[4289]: 2024-03-12 18:15:16 [WARNING]: Gap was of negative duration and could NOT be safely merged (-184.073s). This warning will only show once per batch.  (aw_transform.flood:54)
Mar 13 08:49:30 luis-tp bash[4289]: 2024-03-13 08:49:30 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-web-firefox)  (aw_server.api:279)
Mar 13 08:49:32 luis-tp bash[4289]: 2024-03-13 08:49:32 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-afk_luis-tp)  (aw_server.api:279)
Mar 13 08:52:32 luis-tp bash[4289]: 2024-03-13 08:52:32 [INFO ]: Received heartbeat after pulse window, inserting as new event. (bucket: aw-watcher-window_luis-tp)  (aw_server.api:279)
Mar 13 08:59:15 luis-tp bash[4289]: 2024-03-13 08:59:15 [WARNING]: Gap was of negative duration but could be safely merged (-15.748s). This message will only show once per batch.  (aw_transform.flood:46)
Mar 13 08:59:15 luis-tp bash[4289]: 2024-03-13 08:59:15 [WARNING]: Gap was of negative duration and could NOT be safely merged (-181.011s). This warning will only show once per batch.  (aw_transform.flood:54)

Restarting aw-awatcher resolves it. When using systemd, here's a dirty workaround to limit the impact of the issue (restart awatcher every hour unconditionally):

[Service]
Type=simple
# Verbosity level: -v for warnings, -vv for info, -vvv for debug, -vvvv for trace
ExecStart=/bin/bash -c 'aw-awatcher -vvvv'
Restart=always
RuntimeMaxSec=1h
2e3s commented 5 months ago

it usually happens when the machine is suspended/resumed

I'm reading your reports here, but I can't reproduce it, unfortunately. Please, add more information about your environment and how you run the program.

luisgerhorst commented 5 months ago

Here you go:

Hardware Information:

Software Information:

In extensions, I see a 6 behind the name of https://github.com/flexagoon/focused-window-dbus, not sure if that is the version.

> aw-awatcher --version
Activity Watcher 0.2.4
> cat ~/.config/systemd/user/aw-awatcher.service 
[Unit]
Requires=aw-server.service
After=aw-server.service

[Service]
Type=simple
# Verbosity level: -v for warnings, -vv for info, -vvv for debug, -vvvv for trace
ExecStart=/bin/bash -c 'aw-awatcher -vvvv'
Restart=always
RestartSec=60
RuntimeMaxSec=1h
Nice=18
IOSchedulingPriority=6
# Was not able to test this on a system where it has an effect:
# CPUQuota=10%

[Install]
WantedBy=default.target
> cat ~/.config/systemd/user/aw-server.service  
[Unit]
After=network.target

[Service]
Type=simple
ExecStart=/bin/bash -c 'aw-server'
Restart=on-failure
RestartSec=60
Nice=18
IOSchedulingPriority=6
# Was not able to test this on a system where it has an effect:
# CPUQuota=10%

[Install]
WantedBy=default.target

I also get vastly wrong usage times in the overview of the web interface.

luisgerhorst commented 4 months ago

Just wanted to let you know that I've now built awatcher from source (all worked like a charm) and disabled some of the other watchers. I've now been running that debug build for 2 weeks and have not been able to reproduce the issue since. I will continue to debug and try to pinpoint it.

Here's the changes I made: https://github.com/luisgerhorst/awatcher/commit/26d2f2e67bcd03b4c291dee150fb4cadeeae76a7 - However, it could also be just a coincidence at this point...

2e3s commented 4 months ago

@luisgerhorst thanks, I'm looking at your changes and I don't see anything that could prevent the aforementioned symptoms, so maybe a coincidence indeed. Having it consistently reproduced would certainly help solving it once and for all.

jopereira commented 3 months ago

I've been using @luisgerhorst's fork since May 3, and the problem has not happened since then.

2e3s commented 3 months ago

@luisgerhorst why could it be? E.g. why have you added timeouts in a couple places, and removed the reconnect attempts for Gnome watchers? I see that your changes only affect the watcher loading/initializing but I've got an impression that the failures happen long after loading, during normal functioning.

zbw8388 commented 3 months ago

I also noticed this issue and there seems to be two issues at play:

  1. Watcher fails to return from watcher.run_iteration (fixed by timeout)
  2. For wayland apps, after suspending for a relatively long period of time, awatcher starts to raise [2024-05-19 22:04:58.559502 ERROR watchers::watchers] Error on active window iteration: GetPropertyReply[_NET_WM_NAME] failed (should be fixed by commenting out X11 window watcher but have yet to test it)
luisgerhorst commented 3 months ago

I have now cleaned up my initial mess (https://github.com/luisgerhorst/awatcher/commit/7af01ca230fd1d60cf65bb773dbf5b3692fca79c), and will cont. to bisect it. Just takes some time as I need multiple days to be sure it's still fixed for each version.

luisgerhorst commented 3 months ago

@zbw8388 wrote:

I also noticed this issue and there seems to be two issues at play:

1. Watcher fails to return from `watcher.run_iteration` (fixed by timeout)

If this is true, maybe we should further bisect what in run_iteration fails to return?

2. For wayland apps, after suspending for a relatively long period of time, awatcher starts to raise `[2024-05-19 22:04:58.559502 �[31mERROR�[0m watchers::watchers] Error on active window iteration: GetPropertyReply[_NET_WM_NAME] failed` (should be fixed by commenting out X11 window watcher but have yet to test it)

Great!

2e3s commented 3 months ago

If this is true, maybe we should further bisect what in run_iteration fails to return?

If you use Gnome and given the symptoms, it means that DBus call to org.gnome.shell.extensions.FocusedWindow hangs at some moment. There is only that call (and reporting to AW server).

timeout(watcher_type.sleep_time(&client.config), watcher.run_iteration(&client)).await

@luisgerhorst Your change looks good and universal to hopefully exclude similar problems in this and other watchers. It's simple and free to try anyway. I can commit the change in a few days for 0.2.7, or preferably merge your PR if you push it (with other watchers enabled). Thanks regardless :heart:

2e3s commented 3 months ago

I have released the possible fix for the perceived Gnome issue https://github.com/2e3s/awatcher/releases/tag/v0.2.7

For wayland apps, after suspending for a relatively long period of time, awatcher starts to raise [2024-05-19 22:04:58.559502 �[31mERROR�[0m watchers::watchers] Error on active window iteration: GetPropertyReply[_NET_WM_NAME] failed (should be fixed by commenting out X11 window watcher but have yet to test it)

@zbw8388 I think this is another possible problem, which is related to X11 (otherwise X11 window watcher wouldn't start). Not sure what you mean by wayland apps though. If you'll notice such a buggy behaviour on the newest version, please, open an issue with a description.

luisgerhorst commented 3 months ago

Thanks! I'm testing it and will report back if I observe any issues.

42Willow commented 3 months ago

Issue doesn't occur for me on v0.2.7 of awatcher (installed as module) with hyprland v0.40.0-1 image (my computer was suspended at the blank point in the middle)