Closed pallaswept closed 9 months ago
Sorry, but I'm pretty sure this not the right thing to do. First of all, this changes the target from default.target
to basic.target
with no convincing reason. And combining that with an arbitrarily-selected sleep
really makes no sense.
If you want retries with a lower rate, use RestartSteps=
and RestartMaxDelaySec=
. If things start too early, please add more appropriate ordering deps. But adding a sleep and be done with it really won't make things more "reliable", but the opposite direction.
I manually link ydotoold.service to my xprofile.target.wants/
, which is started by my ~/.xprofile
, at which time my session is surely active.
Restarting on failure in this case doesn't make any sense to me. Sometimes I do not start a GUI session. Instead, I ssh into it from another computer. ydotoold kept restarting and was very annoying so I moved it to my xprofile.target
.
I'm pretty sure this not the right thing to do.
You may be right, ultimately - but consider that before remodelling the entire thing, it might make sense to try something which introduces a small changeset, and thus a lower risk of introducing a breaking change. Especially if you're a random unknown person contributing a patch, un-invited, and unknown to the maintainer. That is the intention, here; to keep my footprint small.
I'd prefer to do it the "Right Way", according to upstream documentation, so if you have some suggestions along those lines that might be an improvement to this, might I suggest you file a PR, and you are welcome to @ me in there if you want my input.
this changes the target from default.target to basic.target with no convincing reason
It follows systemd's docs...
arbitrarily-selected sleep
It wasn't arbitrary... but the dataset was stupid small. Feel free to expand it by buying me more computers :grin: or you could time all your PCs (or better yet, suggest a better dependent unit!)
Restarting on failure in this case doesn't make any sense to me
That was not introduced by this change.
Sometimes I do not start a GUI session.
This is why I didn't use graphical.target
. It seems an obvious candidate at first, but there's a bunch of edge-cases one can imagine where the presence or lack of a GUI doesn't necessarily determine whether ydotool should try to start yet.
TL;DR the service starts too soon, and retries too fast, always errors, and sometimes fails. This version, starts later, and retries slower, and succeeds first time every time.
Details:
I have noticed occasional failures of ydotool to start (looking at my logs, in the region of 1 time out of 10), and consistent (as in, always) generating errors before starting, as per the following journalctl logs:
A retry counter of 3 is about the usual, sometimes it would reach 4, sometimes it would pass the limit of 5 and all-out fail.
On the occasions where the service entirely failed, this would be the same, but reach the restart limit and fail:
Simply starting the service manually after boot when I realised it had failed, it would always start successfully, without error, immediately.
It was clear that the service was starting too soon, so my first port of call was to change the
WantedBy
target to something more appropriate. In order to determine the most appropriate candidate, I manually started the service after a failure, and employed systemd-analyze to find the chain which led to the successful start:Accordingly, I changed the
WantedBy
tobasic.target
. This resulted in 100% successful starts, although it would sometimes have to retry 1 or 2 times, before starting. These are immediate retries (as per the above logs, they happen within the same second of time), so I took two measures to mitigate this:Firstly, I added a 2 second delay before starting the service. 1 second would have been sufficient (refer to the timeframes in the above logs), but I am erring on the side of caution, for consideration of other people's PC's which may be slower. This was sufficient to result in 100% error-free startups.
To add an extra layer of resilience, I also extended the restart timer to 3 seconds, so that if there were a failure (not that I ever saw one, but other PCs may differ) it would be less likely to fail entirely after the 5 restarts (which would take a total of 15 seconds, an order of magnitude longer than any failure I ever saw). This should be more than enough to ensure a successful start, even in the case of an error, which should be unlikely anyway, given the modified target and delayed start.
Frankly, any one of these three measures would likely have been sufficient to ensure that the service always started, but the combination of the three results in a more reliable result and broader compatibility with differing hardware.
The cost to the user is a 2 second delay that would not be noticeable anyway, since even booting to a TTY this entire process should have completed before the shell starts, and certainly before the user could type a command; and any automated usage or services depending on ydotool should be employing systemd mechanics to wait for the service to start, anyway - so it can't cause any breakage in a properly configured system, and should not even be noticeable even in the most demanding of scenarios.
In the worst-case scenario where there would be a failure to start on the first attempt, and retries causing a longer delay (something I was not able to model, even in a VM with a single core limited to 3.6GHz and only 2 GB of memory), it is plausible that a user might notice that the service has not yet started - but in those cases, the service would have completely failed in the previous incarnation of this unit file, so the worst case scenario is still an improvement.
Of course, a complete failure to start is still possible with this unit file, but if that happens, something must be wrong with the system, and the root cause would be elsewhere.
I hope this patch is useful and the above description explains my analysis and the reasoning for the changes.