gatech-csl / jes

The Jython Environment for Students allows students to write Jython programs that can manipulate pictures, sounds, and videos.
http://mediacomputation.org/
60 stars 38 forks source link

Keyboard occasionally locks up (all platforms) #74

Open leafstorm opened 10 years ago

leafstorm commented 10 years ago

Occasionally, after pressing ENTER in the Command Window, JES will stop responding to keyboard input entirely. No keyboard controls work (typing in Command Window, typing in editor, using menu accelerators, etc.), but everything else functions correctly. In addition, system-level keyboard shortcuts (such as Alt-Tab) continue to function.

This bug is present in JES 4.3 and in JES 5.0 alphas 1 through 3.

Trigger Conditions: I have only experienced this bug in the Command Window, and only after pressing ENTER. However, because this happened before and after the JES 5.0 Command Window rewrite, it is probably not specific to the Command Window, but just to some aspect of its event handler.

This bug will manifest on any platform, but it is rare and difficult to trigger. It appears that when system CPU load is high, normal human typing speeds can trigger it, but when load is low, it triggers much less often.

Mechanism of Action: As revealed by the --debug-keys option (present in alpha 3), the bug locks up keyboard input by preventing keystrokes from being delivered to the AWT event-handling mechanisms -- after the bug has been triggered, no AWT KeyEvents are fired.

As far as I can tell, this is not an issue related to the X Window System. Besides the fact that this bug is cross-platform, testing with xev has revealed that keystroke events are being delivered to the Java process by the X server properly, and Java itself is losing the events.

Prior Art: The only similar bug report I could find was http://bugs.java.com/view_bug.do?bug_id=6299259. It included some log information gained from the "Input Manager," but I could not find instructions for how to enable a similar level of detail on our logs.

Potential Causes: It can't be something OS-specific, because it has been confirmed on both Windows and Linux. It's probably not JRE-specific, either: I have replicated it both on OpenJDK 1.7 and 1.8, and while I don't know what JRE's it has appeared under on Windows, it was most likely the offiical Oracle JRE.

I suspected it might be linked to the amount of time taken by the ENTER event handler. However, timing that event handler did not reveal any patterns (such as a time threshold) associated with the keystrokes that trigger the bug. In addition, if it was a simple problem of the event handler taking too long, presumably there would be more reports of this kind of bug on the Internet.

Because of the platform-independent nature of the bug, it could be threading-related. In an attempt to resolve it, I put effort into resolving as many of the threading errors as I could. However, the bug continues to manifest even after making the startup process threadsafe, and --check-threads does not report any threading errors.

I can't rule out that it's a threading issue. Inspection with jstack reveals that the thread stacks are in slightly different places in the frozen-keyboard state (for example, there is an additional "process reaper" thread), but these differences are not consistent, and do not appear to be related to the AWT or keyboard input. And the --check-threads flag can only detect threading errors that manifest in graphics: threading errors in the keyboard subsystem may require a completely different technique to identify.

Replication: The below script will consistently replicate the bug on my development workstation (a ThinkPad T61 running Fedora 20). It only works on Linux, and it requires the xte utility to be installed, which it uses to generate lots of fake keystrokes. (Equivalent commands for other systems can be substituted.)

Open JES with the --check-threads and --debug-keys command-line options, then run this script, and in the 5 second break it gives you, switch back to JES and focus on the command window. A few warnings:

The source:

# -*- coding: utf-8 -*-
"""
keyboard-stress.py
==================
Generates a bunch of ENTER key hits.
"""
from subprocess import check_call
from time import sleep

COUNT = 100
DIVISOR = int(COUNT / 10)

print "You have 5 seconds to switch to JES..."

sleep(5)

command = ['xte', None, 'usleep 1000', 'key Return', 'usleep 10000']

for n in range(COUNT, 0, -1):
    if n % DIVISOR == 0:
        print "%d left..." % n
    command[1] = 'str %d' % n
    check_call(command)

Conclusion: I have no idea what's causing this bug. Please help...