(Question) toshy CPU usage spikes during games

jfernandez commented 1 month ago

I noticed toshy using a significant amount of CPU when playing games. This only happened when I was actively holding the ADSF keys for movement in Terraria.

I suspect this is due to having to process all those repeated key strokes. Is it possible to exclude processes?

RedBearAK commented 1 month ago

@jfernandez

I suspect this is due to having to process all those repeated key strokes.

Correct. I’m looking into some kind of time-based or caching solution. Modifier keys don’t seem to cause the same effect, possibly because repeats are ignored, or just because there are only a few modmaps versus keymaps. Not sure yet if the issue is completely resolvable for normal keys.

Is it possible to exclude processes?

App windows in a “remotes” list cause the modmaps and general keymap to be deactivated, but conditions continue to be evaluated on each key press (to know when to start remapping again). So the only practical solution at the moment is to disable Toshy from the tray icon when playing games where it’s necessary to hold non-modifier keys down.

I don’t do that sort of thing, so I didn’t think much about the issue until recently. However I did notice that if a VM or something was maxing out the CPU then input would become significantly delayed.

RedBearAK commented 1 month ago

Are you on a dual-core CPU? On my Ryzen 3700u the keymapper process is only able to use one thread of the 4c/8t CPU, which is 12.5%.

jfernandez commented 1 month ago

I'm using a ThinkPad Z16 Gen 2 which has the 7940HS AMD CPU (16 threads):

Handle 0x0003, DMI type 4, 48 bytes
Processor Information
    Socket Designation: FP8
    Type: Central Processor
    Family: Zen
    Manufacturer: Advanced Micro Devices, Inc.
    ID: 41 0F A7 00 FF FB 8B 17
    Signature: Family 25, Model 116, Stepping 1
    Flags:
        FPU (Floating-point unit on-chip)
        VME (Virtual mode extension)
        DE (Debugging extension)
        PSE (Page size extension)
        TSC (Time stamp counter)
        MSR (Model specific registers)
        PAE (Physical address extension)
        MCE (Machine check exception)
        CX8 (CMPXCHG8 instruction supported)
        APIC (On-chip APIC hardware supported)
        SEP (Fast system call)
        MTRR (Memory type range registers)
        PGE (Page global enable)
        MCA (Machine check architecture)
        CMOV (Conditional move instruction supported)
        PAT (Page attribute table)
        PSE-36 (36-bit page size extension)
        CLFSH (CLFLUSH instruction supported)
        MMX (MMX technology supported)
        FXSR (FXSAVE and FXSTOR instructions supported)
        SSE (Streaming SIMD extensions)
        SSE2 (Streaming SIMD extensions 2)
        HTT (Multi-threading)
    Version: AMD Ryzen 9 PRO 7940HS w/ Radeon 780M Graphics 
    Voltage: 1.2 V
    External Clock: 100 MHz
    Max Speed: 5250 MHz
    Current Speed: 4000 MHz
    Status: Populated, Enabled
    Upgrade: None
    L1 Cache Handle: 0x0000
    L2 Cache Handle: 0x0001
    L3 Cache Handle: 0x0002
    Serial Number: None
    Asset Tag: None
    Part Number: None
    Core Count: 8
    Core Enabled: 8
    Thread Count: 16
    Characteristics:
        64-bit capable
        Multi-Core
        Hardware Thread
        Execute Protection
        Enhanced Virtualization
        Power/Performance Control

RedBearAK commented 1 month ago

I see. I’m assuming htop is showing per-core percentages. The total in the screenshot is already more than 100%. The keymapper process is single threaded though, so should only be capable of using up one of those 16 threads. I use btop in place of htop.

jfernandez commented 1 month ago

For performance optimization work, I prefer to see the total CPU usage percentage rather than the percentage adjusted by the total number of CPUs. htop cannot determine if a process ran exclusively on one CPU, as the process and its threads may have jumped across all cores. It simply indicates that the process is currently using the equivalent of 57% of one CPU's time. I prefer using this since it's easier to perf comparisons regardless of how many CPUs you have.

btop is adjusting the % by the number of CPUs:

RedBearAK commented 1 month ago

I kind of get what you're saying. Looks like if you turn on the per-core option in btop it's actually treating each thread as a "core", because it shows the keymapper thread as using nearly 100% when a key is held down. But similar concept.

Do you have any experience with optimizing Python code? There are a lot of aspects of the keymapper and how to mitigate this issue that I'm not that familiar with. I'm just using a fork someone made of another keymapper.

jfernandez commented 1 month ago

I have experience doing performance optimization, but not for Python. What I normally do is generate a flamegraph when the process is running hot and use that visualization to surface the code hot spots. This should include Python code and kernel code that gets executed when Python does syscalls. I recommend using flamegraph-rs, even though it's written in Rust, it should work for any process.

There also appears to be a Python specific library that you can use https://pypi.org/project/flameprof/

RedBearAK commented 1 month ago

@jfernandez

I haven't really tried the profiling yet, because I actually have some idea of why all the excess CPU usage happens (the constant re-evaluation of keymap conditional expressions while the key is repeating). But I just tried something very simple that does nothing but cause "repeat" events to be passed through without going through that evaluation process or any kind of transforming. That basically kills all CPU usage while the "repeat" events are happening. Well, I see some momentary spikes of 2-4% in btop with a refresh rate of 500ms, but it mostly says zero percent while holding the key.

I'm not certain about the full implications of this, but one of my main concerns was if it would cause combos while holding modifier keys to stop working (in other words, would it break what Toshy does), but that doesn't seem to be the case. I've only been testing it for a few minutes, but the usual stuff I expect Toshy to do is happening when I expect it to happen, including when I hold down modifiers and fire off repeat combos by pressing the normal key multiple times.

If you'd like to try this out, you'll have to edit the installed copy of transform.py that's embedded in the Python virtual environment folder in ~/.config/toshy/. The change is quite simple and would be very easy to revert.

I should probably note here that if you installed Toshy more than a couple of weeks ago, you'll need to put keyszer wherever you see xwaykeyz here. I did a fork under the new name just recently. Older installs will still be using keyszer, but the code will be 99% identical still.

The path to the transform.py file will look something like this, depending on the Python version that was used to make the venv:

~/.config/toshy/.venv/lib/python3.12/site-packages/xwaykeyz/transform.py

I'm sure you know the drill, it's just a text file. Find this:

def on_event(event: InputEvent, device):
    # we do not attempt to transform non-key events
    # or any events with no device (startup key-presses)
    if event.type != ecodes.EV_KEY or device is None:
        _output.send_event(event)
        return

When you find this function (just search for "def on_event"), the "InputEvent" type hint may not be there yet. Don't worry about that. I just added it.

Before the rest of the stuff in that function, add this:

    # EXPERIMENTAL: pass through "repeat" key events without further processing
    # Test to see if this diminishes the high CPU usage when a "normal" key is held down.
    # What negative side effects can we expect from doing this?
    if event.value == 2:
        _output.send_event(event)
        return

One immediate side effect is that this causes no verbose log output while a key is repeating. Without extra logging it just acts like there's no input, but the input should still be passing through directly to the application window. There will still be log activity (and of course momentary CPU activity) each time a key is "pressed" and "released", which causes a key event with a different value (1 or 0, vs 2 for repeats). But repeats will just not be part of the equation anymore as far as the rest of the keymapper code is concerned, as long as this change is enabled.

After saving you can restart Toshy from the tray icon, or try toshy-config-verbose-start to see the debugging output and verify that everything is working. I would not expect an exception of any kind to show up when doing this.

The only thing I can think of that would forestall putting this in the main branch of the keymapper is that if you're remapping a single normal key for some reason, or holding an entire combo including the non-modifier key, that might suddenly not be remapped to what was intended for as long as it's "repeating". But I can't immediately think of an example of where I would want to be doing that. Mostly what people want a keymapper to do is catch one instance of a combo and emit some other combo in its place, but not a stream of the same combo with all keys held down. So... um... 🤷🏽

Let me know what you think, if you try this out.

My other ideas about how to deal with this are more complicated and might not even be practical, like trying to cache the previous results of evaluating conditionals and just not performing those evaluations again until some reasonable amount of time has passed or the context/input changes. I could implement an API function that would let the user enable this really simple change instead, if they think it would not interfere with their intended usage of the keymapper, while leaving it as a disabled opt-in sort of thing.

RedBearAK commented 1 month ago

I just realized I still had per-core enabled on btop, so those "spikes" of 2-4% that I was seeing were more like 0.2-0.4% in the overall capacity of my particular CPU (2-4% of the capacity of a single thread on a single core). Quite the difference from showing 99% usage of a single thread (half a core) while holding a key if I'm in per-core mode.

RedBearAK commented 1 month ago

Edited the earlier comment to remove ecodes.EV_REP and replace it with 2 since ecodes.EV_REP apparently doesn't work the way I thought it would.

RedBearAK commented 1 month ago

An enhanced version if you want some log output for the "repeat" events.

    # EXPERIMENTAL: pass through "repeat" key events without further processing
    # Test to see if this diminishes the high CPU usage when a "normal" key is held down.
    # What negative side effects can we expect from doing this?
    # event.value: 2 is "repeated", 1 is "pressed", 0 is "released" (source: evtest output)
    if event.value == 2:
        if logger.VERBOSE:
            print()     # give some space from regular event blocks in the log
            debug(
                "### Passing through repeating key event unprocessed to reduce CPU usage. ###", 
                ctx="--"
            )
        _output.send_event(event)
        return

RedBearAK commented 1 month ago

There is a pretty peculiar relationship between the keyboard repeat rate you set up in GNOME, and the number of characters that will show up in an app like GNOME Text Editor, versus the number of logged "repeat" events the keymapper says it processed. (Edit: Just to be clear, it probably doesn't matter what desktop environment or window manager you're using. I happened to be in GNOME while testing what would happen when messing with the repeat rate and delay settings. The sliders are found in the GNOME Settings GUI, under Accessibility -> Typing -> Repeat Keys. Click the chevron to reveal the drop-down with the sliders.)

In some cases, more characters will show up than repeat events that show up in the log. In other cases it will appear as if there are many more "repeat" events in the log than characters appearing in the app window (if the repeat rate is slow). And setting the rate to the absolute maximum resulted in no repeat characters showing up in the app window, but tons of log events. Which was very odd, but also happens when the keymapper isn't running, so it really has nothing to do with the keymapper. It's like the app ignores repeating key events that are happening too fast. Or maybe the GNOME shell is doing the ignoring.

So it's like the relationship between normal press/release events and "repeating" key events in the keymapper versus an app window is much less of a concrete thing than I was expecting. What those repeat events actually accomplish in an app window appears to be regulated by the shell or the Linux input system based on your chosen repeat rate and delay time.

In other words, if you have one log event each time the keymapper encounters a key "repeat" event, don't be surprised if the number of characters that show up in an app window is not exactly the same. It may be less, or even more, than the events emitted by the keymapper's virtual keyboard device. And that does not appear to be the keymapper's fault.

The delay setting in particular may hide some number of repeat events from the app before the delay "gate" opens up and allows the app to see the repeat events.

jfernandez commented 1 month ago

Thank you for the detailed explanation @RedBearAK. I will give your suggestion a try and report back soon.

jfernandez commented 1 month ago

Sorry, I dropped the ball on this. I'll look into this this week.

RedBearAK commented 1 month ago

I just merged in a version of this bypass to the xwaykeyz main branch, so reinstalling Toshy at this time should leave you with a new Python virtual environment containing version 1.1.0 of the keymapper, with this bypass enabled by default. I have been using the keymapper this way for a while and observed no negative side effects on remapped key combos, so I've just made an API function the user can place in their config file to optionally re-enable the processing of repeating keystrokes if they want.

ignore_repeating_keys(False)

That line in the config file would convert the keymapper back to processing repeating keystrokes the same way as "pressed" and "released" keystrokes. But without that, by default the keymapper will pass the repeats through, leaving minimal impact on CPU usage while holding keys down.

It is safe to reinstall Toshy from the same zip file or a new one, as long as you've kept your edits of the config file within the marked "slices" that are protected from being overwritten during the install process. Those slices and a small sqlite3 file hold your custom settings.

This modification is in the keymapper, which gets re-cloned from GitHub by the Toshy installer on each run, so it's not technically necessary to download a new Toshy zip to install from, if you still have the original zip or folder you installed from. Nothing much has been added to Toshy's config or accompanying apps recently.

RedBearAK commented 2 weeks ago

@jfernandez

New update to Toshy has reduced CPU usage substantially during general typing. This was caused by a custom function for sophisticated property matching that lives inside the Toshy config file and enables some of the more interesting aspects of what Toshy can do compared to Kinto, such as support for knowing about the loss of screen focus while using Synergy.

You may or may not notice any difference on your system, and it has nothing to do with the fix for repeating keys that I put in the keymapper proper. But if you've ever noticed a short delay in characters appearing while typing, especially if you type fast or the system is really busy, this should help with that or even make it seem to disappear.

It's already merged into the main branch.

I hope between these two fixes we can call this issue resolved as well as it can be without engaging in a lot more complicated work.

jfernandez commented 2 weeks ago

@RedBearAK Thank you for the follow-up. Yes, the issue is now resolved for me.

RedBearAK / toshy

(Question) toshy CPU usage spikes during games #279