On FreeBSD Xmonad loses first hotkey sometimes

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?

1. Install FreeBSD (amd64).
2. Write a minimal xmonad.hs with xterm binding.
3. Start xmonad in Xorg.
4. Try (multiple times) Mod+Shift+Return.

What is the expected output? What do you see instead?

It's expected that xterm opens with the first time Mod+Shift+Return is pressed. 
Sometimes Xmonad loses the key-combo and you need to press it twice in fast 
succession. If you wait too long between the first and the second time in 
sequence, the hotkey will be lost again and again.

I repeat once again: it happens SOMETIMES, but on average every 4th time. Try 
to open terminal and exit it again (Mod+Shift+Return -> Ctrl-D -> 
Mod+Shift+Return -> Ctrl-D -> ...).

What version of the product are you using? On what operating system?

hs-xmonad-0.11_7

FreeBSD 10.0 RELEASE (amd64).

Are you using an xmonad.hs?  Please attach it and the output of "xmonad
--recompile".

You don't need mine. Use a minimal one:

import XMonad

main = xmonad defaultConfig
        { modMask = mod1Mask
        , terminal = "xterm"
        }

Additional info:
Cannot be reproduced on Linux. Everything works as expected there.

Original issue reported on code.google.com by martin.s...@gmail.com on 8 Aug 2014 at 8:41

GoogleCodeExporter commented 8 years ago

I forgot something essential:

This is only noticeable when trying to open a terminal. I don't notice this 
buggy behavior when doing something else (switching workspaces, focus, shifting 
windows etc). That means, everything else works on first press of a hotkey.

It has to be something very special about how Xmonad sends Mod+Shift+Return and 
when a terminal application is to be opened.

It does not depend on which Mod key I use (reproducible with Mod1 and Mod4) and 
it does not depend on which terminal application I use. It also does not depend 
whether a terminal application is in focus or not. Sometimes I use Firefox or 
something else and the hotkey for the terminal is lost.

I have really no idea how to diagnose it further.

Original comment by martin.s...@gmail.com on 8 Aug 2014 at 8:56

GoogleCodeExporter commented 8 years ago

Further information:

Key bindings that are not affected:
- Mod+<1...>
- Mod+Shift+<1...>
- Mod+Tab
- Mod+M
- Mod+Return
- Mod+<H,J,K,L>
- ...

Key bindings which are affected by bug above:
- Mod+Shift+Return
- Mod+Shift+Backspace
- Mod+P
- Mod+Shift+P
- ...

Original comment by martin.s...@gmail.com on 9 Aug 2014 at 3:42

GoogleCodeExporter commented 8 years ago

Note that, of the affected keybindings, all but one are "spawn"s — and the 
exception isn't even an xmonad keybinding (see 
http://www.haskell.org/wikiupload/b/b8/Xmbindings.png and 
http://xmonad.org/xmonad-docs/xmonad/src/XMonad-Config.html#line-170) but an X 
server internal binding that is usually disabled by default.

Original comment by allber...@gmail.com on 9 Aug 2014 at 4:16

GoogleCodeExporter commented 8 years ago

Mod+Shift+Backspace is my custom binding that starts a shell script:

((modm .|. shiftMask, xK_BackSpace), spawn "~/.xmonad/scripts/shutdown.sh")

This is a good hint, you've given me here. So how to diagnose it further from 
here on?

Original comment by martin.s...@gmail.com on 9 Aug 2014 at 4:39

GoogleCodeExporter commented 8 years ago

If you can get a terminal open (or switch to a virtual console), it might be 
worth using `truss -f` on the running xmonad process to see if something is 
going wrong with `spawn`. You should see xmonad fork, then the child fork again 
and exit, and the grandchild exec `sh -c ...`.

This would not actually be the first time we've had problems on FreeBSD; but 
the past problems were due to various attempts to improve our process handling 
which tripped over certain differences in *BSD's handling of "orphaning" child 
processes, and showed up differently — in particular, it could not happen on 
the *first* child spawned, but only after hitting the child process limit which 
is usually fairly high (in the hundreds at least) these days.

Original comment by allber...@gmail.com on 9 Aug 2014 at 6:41

GoogleCodeExporter commented 8 years ago

One thing I can see from the truss output is that when it does NOT work, the 
following happens:

1) fork()
2) child process executing
3) poll(); recvmsg; setitimer(); poll(); SIGALRM

When it works this happens:

1) setitimer(); setitimer();
2) fork()
3) setitimer(); (again)
4) child process executing (SIGALRM happening multiple times within the child 
process)

Does it help or do you need the detailed truss output?

Original comment by martin.s...@gmail.com on 9 Aug 2014 at 8:06

GoogleCodeExporter commented 8 years ago

I would like to see the full truss output. You can, however, clean it up a bit 
by building your custom xmonad manually (see 
http://xmonad.org/xmonad-docs/xmonad/src/XMonad-Core.html#recompile) with the 
ghc options:

    -rtsopts -with-rtsopts -v0

which will disable the runtime system's GC timer (it will GC on all allocations 
instead, which can slow programs down a bit) and remove the itimer and SIGALRM 
from the trace.

Original comment by allber...@gmail.com on 10 Aug 2014 at 12:30

GoogleCodeExporter commented 8 years ago

Here are the both xz-compressed truss output files (the one where the xterm 
starts is quite hard to produce).

mod-shift-return-ignored.truss.txt -> Key binding did not work

mod-shift-return-ok.truss.txt -> Key binding worked (some noise at the end, 
closing the window)

Original comment by martin.s...@gmail.com on 10 Aug 2014 at 8:09

Attachments:

GoogleCodeExporter commented 8 years ago

I'm also on FreeBSD, and while I haven't debugged the issue formally, I can say 
that they execute more reliably (haven't seen any issues since) when spawned 
inside tcsh.

First I tried something like the following:

I converted:

spawn "dmenu_run"

to

spawn "tcsh -c 'dmenu_run'"

Then, I decided to dig into the definition of spawn, and then I made an 
alternate version of spawn instead...

eg like the following:

-- | spawn. Launch an external application. Specifically, it double-forks and
-- runs the 'String' you pass as a command to \/bin\/sh.
--
-- Note this function assumes your locale uses utf8.
spawn' :: MonadIO m => String -> m ()
spawn' x = spawnPIDTCSH x >> return ()

-- | Like 'spawn', but returns the 'ProcessID' of the launched application
spawnPIDTCSH :: MonadIO m => String -> m ProcessID
spawnPIDTCSH x = xfork $ executeFile "/bin/tcsh" False ["-c", encodeString x] 
Nothing

Now, when I execute something like

spawn' "dmenu_run"

It works reliably.

Hope this helps, although you might be experiencing a different issue.

Regards,
Tim

Original comment by beyer...@gmail.com on 18 Aug 2014 at 7:26

GoogleCodeExporter commented 8 years ago

Yes. Indeed this workaround using tcsh appears to help. I haven't tested it 
very extensively, yet, but the main symptoms are gone.

I still suspect that it is some kind of race condition with timers/signals. 
Maybe the startup of /bin/sh is very fast on FreeBSD. Cannot tell for sure 
what's going on.

I also compiled with "-rtsopts -with-rtsopts -v0" (I checked ps output during 
compilation if it really uses the flags). It did not improve anything and did 
not make truss faster (still listing setitimer, alarms etc). I am getting just 
about the same output as above.

Btw, FreeBSD upgraded to GHC 7.8.3 a few days ago and the problem persists. 
Tim's workaround still helps here.

Original comment by martin.s...@gmail.com on 19 Aug 2014 at 2:37

GoogleCodeExporter commented 8 years ago

I've had this problem for some time now 
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=181049). Here are some of my 
observations so far:

GHC 7.4 is not affected
GHC 7.6 and 7.8 both seem to produce this bug

xmonad 0.8 did not show that bug
xmonad 0.11 did show the bug

As a workaround I copied the definition of spawn and doubleFork from xmonad 0.8 
into the xmonad 0.11 source code which then did not seem to exhibit this 
problem.

AFAICT it is related to interval timers. interval timers are supposed to be 
disabled right before execve(). executeFile does disable the interval timers 
but in some cases the call to setitimer does not always take place. Then the 
child process is terminated by a SIGALRM.

I am observing the same behaviour even when compiling with -rtsopts 
-with-rtsopts -v0.

Sadly I have no idea to fix or debug this. I started looking at the generated 
Core for affected/unaffected spawn/doubleFork versions but have no clue yet..

Original comment by dernst...@gmail.com on 13 Oct 2014 at 7:24

GoogleCodeExporter commented 8 years ago

Have you or anyone reported this on the ghc bug tracker?

-v0 and -V0 are different things; the latter should disable itimers completely. 
-v is part of the eventlog service and does nothing useful in this case because 
0 is not a valid event type.

Original comment by allber...@gmail.com on 13 Oct 2014 at 7:32

GoogleCodeExporter commented 8 years ago

I have done some experiments, too, and I copied the functions involved doing 
spawn into a simple program using the IO monad to fork. I could not produce a 
problem with this setup, but maybe the race condition leading to this does not 
appear there at all, because it is a lightweight program that executes forks in 
system a bit faster.

I ask myself if this has something to do with the fact that the spawn runs 
inside an X action. Do I understand it right or am I completely wrong? In some 
parts Xmonad lacks function type descriptions.

Why is SIGALRM handler installed at all there?

Original comment by martin.s...@gmail.com on 13 Oct 2014 at 8:04

GoogleCodeExporter commented 8 years ago

spawn is handing it upstream to IO via liftIO, as indicated by the MonadIO 
constraint; this should be negligible overhead (nanoseconds if it's not 
optimized away entirely which it should be).

The periodic itimer and SIGALRM handler are used by GHC's runtime for thread 
scheduling, profiler ticks, and IIRC determining when to do a full instead of 
partial garbage collection, among other things.

Also, which function type descriptions are missing? I am looking at 
http://xmonad.org/xmonad-docs/xmonad/XMonad-Core.html#v:spawn

Original comment by allber...@gmail.com on 13 Oct 2014 at 8:27

GoogleCodeExporter commented 8 years ago

I followed some of the code (I am still a beginner, so I need to look really 
long at things and even learn new stuff). I copied some code over in an empty 
project and made the effects disappear. This is exactly the same code for 
spawn, just cut out of Xmonad. I just wanted to give some further insights, but 
I am not sure why this happens. I thought it might help, but it seems it 
doesn't. Sorry.

I shouldn't say that Xmonad lacks type descriptions. I wanted to say that for 
my eyes there is not enough information to infer what types Xmonad key actions 
operate on. I could not figure out, if I am dealing with an X action within the 
key press handlers or if is plain IO action (as I said above). Many things are 
abstract in this places and as I said, I need to look quite long to understand 
in which context the handler operates and I need to learn some concepts of 
Haskell that are used in Xmonad that are still new to me.

Original comment by martin.s...@gmail.com on 13 Oct 2014 at 9:14

GoogleCodeExporter commented 8 years ago

OK here's my truss output with -V0. The problem does not occur here anymore. 
Also no more calls to setitimer(0,{0.000000, 0.000000 },0x0) right before 
execve("/bin/sh", ...) as would be desired in the normal case. But ofc, the 
setitimer() call in executeFile only takes place if timers are currently 
enabled..

I did not report this problem to GHC yet and no one else has to my knowledge 
done that. We first wanted to find out if it's a problem with the FreeBSD port, 
but Gabor Pali who maintains this port, couldn't reproduce it on his systems. 

Also added another truss log without V0 but with the eventlog enabled 
(xmonad.eventlog.truss.bz2). I tried starting a test program which outputs if 
the interval timers are still enabled. It worked for PID 2435 and 2437 and did 
not work for PID 2432, 2441, 2444. (you can grep for pid nr and execve to find 
your way around). I can't see a common theme however. For PID 2432 there were 
two threads running before execve aparently . For PID 2441 there was a GC run 
between fork and execve...

Do you think that's enough information to report to GHC?

Original comment by dernst...@gmail.com on 14 Oct 2014 at 7:26

Attachments:

GoogleCodeExporter commented 8 years ago

Since SIGALRM is operated by GHC itself when forking processes internally it 
would be logical to report directly to the GHC project, I guess. Can you do 
this? I think you tried more things than me to look at the problem. Please post 
a link to the report here.

I cannot understand how the port maintainer cannot reproduce it. I wonder what 
architecture and what customizations he is using, because I would like to have 
a system on which I "cannot reproduce it". This is a pretty annoying behavior 
on all systems I have (plain GENERIC FreeBSD/amd64; but also in a Virtualbox 
environment you can reproduce it).

Original comment by martin.s...@gmail.com on 24 Oct 2014 at 5:52

GoogleCodeExporter commented 8 years ago

I can reproduce this issue under stock GENERIC FreeBSD/amd64 10.0 (running 
directly on the hardware, i.e. no virtualization).

Original comment by reaper.t...@gmail.com on 14 Nov 2014 at 7:56

GoogleCodeExporter commented 8 years ago

Confirming the problem on FreeBSD/amd64 10.1-RELEASE.

Original comment by martin.s...@gmail.com on 14 Jan 2015 at 7:03

GoogleCodeExporter commented 8 years ago

I got a bit side-tracked and honestly forgot about this bug... sorry.
Anyhow, Gabor Pali said he might try and talk to a GHC dev. But he suspects it 
might be specific to FreeBSD in that some part of the OS/userland could be 
implemented slightly differently than to what GHC expects.

If anyone likes can you test the following patch?

It's just adding another forkProcess call which seemed to fix it for me.

Original comment by dernst...@gmail.com on 20 Jan 2015 at 7:58

Attachments:

xfork.patch

GoogleCodeExporter commented 8 years ago

This workaround with two chained forkProcess calls also works properly.

Original comment by martin.s...@gmail.com on 21 Jan 2015 at 8:36

GoogleCodeExporter commented 8 years ago

I had exactly the same problem with the same bindings, and I confirm that patch 
fixed it.

Original comment by olivier....@gmail.com on 15 Feb 2015 at 9:04

codehenry / xmonad

On FreeBSD Xmonad loses first hotkey sometimes #576