Enerccio / ewlc

Wayland compositor library - extended
MIT License
20 stars 3 forks source link

[CLOSED] Infinite loop in XWayland initialisation #11

Closed Enerccio closed 7 years ago

Enerccio commented 7 years ago

Issue by Calrama Wednesday Jan 14, 2015 at 14:02 GMT Originally opened as https://github.com/Cloudef/wlc/issues/3


If there is a normal XServer running on $DISPLAY=:0.0, the following loop will not terminate; if the normal XServer is instead running on $DISPLAY=:1.0, the loop will terminate and XWayland will be loaded correctly:

https://github.com/Cloudef/wlc/blob/ef296c5fb8e3e18acdb4bb12dd813fcff1b3faba/src/xwayland/xwayland.c#L81-L107

UPDATE1: The infinite loop happens because after continuing the loop after the decrease in https://github.com/Cloudef/wlc/blob/ef296c5fb8e3e18acdb4bb12dd813fcff1b3faba/src/xwayland/xwayland.c#L104 wlc reaches the same line once more.

UPDATE2: It seems to me that https://github.com/Cloudef/wlc/blob/ef296c5fb8e3e18acdb4bb12dd813fcff1b3faba/src/xwayland/xwayland.c#L100 should read "kill(owner, 0) == 0)" instead of "kill(owner, 0) != 0)", since wlc should try to use this display only if killing the XServer was successfull, not if the XServer is still running.

Enerccio commented 7 years ago

Comment by Cloudef Wednesday Jan 14, 2015 at 15:22 GMT


The decrease there is pointless. Thanks for spotting that. https://github.com/Cloudef/loliwm/issues/49 (related)

Enerccio commented 7 years ago

Comment by Cloudef Wednesday Jan 14, 2015 at 15:33 GMT


Does dd8175a053 fix it?

Enerccio commented 7 years ago

Comment by Calrama Wednesday Jan 14, 2015 at 18:27 GMT


It does fix it, but would a errno check in line 100 like "&& errno != ESRCH" not have been enough? After all, a failed signal to the supposed XServer process does not mean it is not running. It could also mean that - like in my case - you do not have permission to singal it (which is why in this case errno will be set to EPERM). I would argue that instead of later down the line checking if you can actually open the lock file, it would be better to check why the kill failed, since it gives you the necessary information to determine whether the process actually exists.

Enerccio commented 7 years ago

Comment by Cloudef Wednesday Jan 14, 2015 at 18:53 GMT


Kill with 0 does not actually kill. It checks whether the process is running. I removed the fragile continue part, as what this change does is that it just tries to open the lock file for writing again after cleaning up the files for X server that was no longer running.

However, what happens here most likely is that on some systems wlc can't remove those files and thus fails to open the file for writing (or something like that), which is why I never encountered the bug myself.

Edit: I took my brain, eyes and reread what you wrote. And yeah checking for ESRCH makes sense here, though this change makes sense also.

Enerccio commented 7 years ago

Comment by Cloudef Wednesday Jan 14, 2015 at 18:58 GMT


I added check for ESRCH as well.

Enerccio commented 7 years ago

Comment by Calrama Wednesday Jan 14, 2015 at 22:14 GMT


I reread what I wrote myself and I must apologise, because there is a problematic typo in there. The check should be "&& errno == ESRCH" (at least according to this: http://man7.org/linux/man-pages/man2/kill.2.html). Then the alternative reads "If I fail to signal the XServer process and the reason I fail is because that process does not exists, try to remove those files ...". Right now with my (wrong) check it reads "If I fail to signal the XServer process and the reason I fail is anything other than the XServer process not existing, try to remove those files ...".

Enerccio commented 7 years ago

Comment by Cloudef Wednesday Jan 14, 2015 at 23:31 GMT


I rewrote the latest commit with inverted condition, and added some comments for clarity.

Enerccio commented 7 years ago

Comment by Calrama Thursday Jan 15, 2015 at 00:02 GMT


Thank you for the quick resolution.