Openjkded CPU use regression since 2016-03-21 build

Caelish commented 8 years ago

Operating system and version: Ubuntu 14.04 with Ubuntu (HWE) low latency kernel 4.2

Is this for single player or multiplayer? Multiplayer

Description of the bug (and if possible, steps to reproduce the bug): Since upgrading to the 2016-03-21 build (from a 1-2 weeks older build), CPU usage has significantly increased on one of my servers. The server in question runs about ~20 openjkded.i386 instances, and while it runs some other stuff too, nothing other than my upgrading OpenJK explains the increase in CPU usage. Considering the recent busy-waiting changes, my money is on this being an OpenJK regression.

See these graphs (I upgraded around the 22nd): Load avg over 1 week: https://i.imgur.com/ysiyhcu.png Load avg over 2 weeks: https://i.imgur.com/LIbMhWL.png CPU use over 2 weeks: https://i.imgur.com/LQPXkRa.png (excuse weird graph with weird data points there)

RAM usage and such have remained largely unchanged, it seems to be totally CPU bound.

Strangely, I haven't noticed the same issue on my other server, which runs ~10 openjkded.i386 instances, and gets significantly lower traffic. Other differences are in hardware (including CPU instruction sets) and kernels:

Affected server: 20 fairly high-traffic OpenJK instances on i7-4790K CPU and Ubuntu low latency kernel 4.2 Unaffected server: 10 farly low-traffic OpenJK instances on Xeon D-1520 CPU and custom 4.4 kernel.

Both are dedicated hardware, not virtualized.

What did you expect to happen instead? CPU usage should remain the same or decrease between OpenJK builds.

ensiform commented 8 years ago

com_busyWait 1 and compare.

Caelish commented 8 years ago

Will do, expect me to get back to you in a weekish on that.

ensiform commented 8 years ago

This pertains to what i discussed in your thread about sv_fps 1 etc. Specifically https://github.com/JACoders/OpenJK/pull/695

Caelish commented 8 years ago

Yeah so about com_busywait 1...

omgwtfojk

ensiform commented 8 years ago

Its essentially the same as old, and this code comes from the other big similar project ioquake3 so...

smcv commented 8 years ago

If you set com_maxfps to be about the same as sv_fps, with com_busywait 0, does that address this?

Caelish commented 8 years ago

@smcv Isn't com_maxfps purely client-side? I'll get back to you on that if I'm wrong though.

@Ensiform for what it's worth, I can very reliably reproduce 100% CPU usage with com_busywait 1 under Ubuntu 14.04. Definitely doesn't seem like previous behavior.

smcv commented 8 years ago

One of my development versions of this patchset would have consistently had 100% CPU with com_busywait, but for the one that got merged I'm fairly sure I reinstated the arbitrary 5ms sleep?

smcv commented 8 years ago

Com_Frame() is meant to be using SV_FrameMsec() (on dedicated servers) instead of going into the com_maxfps code path, so it should indeed be respecting sv_fps and ignoring com_maxfps. However, if something is wrong with the implementation or your configuration, it's possible that com_maxfps might be used anyway?

smcv commented 8 years ago

Running an otherwise idle dedicated server on my laptop, varying sv_fps has about the results I'd expect: sv_fps 1 makes CPU usage drop off, sv_fps 1000 occupies an entire CPU core, sv_fps 125 uses noticeably more than the default of 40. com_maxfps doesn't have any obvious effect, which is what I'd hope for from the code.

One thing that has changed, which I didn't really intend: previously, the main loop had an arbitrary 5ms sleep after every frame. If the server was CPU-bound, that would mean that sv_fps 40 (25ms per frame) would actually be more like 33fps (30ms per frame), because the server spends 5ms asleep . However, it's probably actually a larger impact on CPU usage than that, because during those 5ms pauses, the server doesn't do anything at all, not even processing incoming network packets (which would wake an ioquake3 server up even if it isn't time to do the next frame); and once it has yielded control to the kernel, there's no guarantee that the process will be scheduled immediately after the 5ms pause has completed, so the sleep might end up greater than intended.

With the new ioquake3-like code, the server is only limited by sv_fps, so it will wake up whenever either there's a network packet to process, or the framerate-limiter predicts that it needs to start work on the next frame of game logic to meet its 25ms deadline. The intention is that it processes network packets ASAP for minimal latency, does its best to meet the game-logic framerate goal that you set for it via sv_fps, and goes to sleep whenever it isn't working towards one of those goals.

The 5ms pauses were really a bit of a hack IMO (although I was mostly thinking about the client-side, where they were used as a substitute for a proper framerate-limiter). They make the server's latency a little worse, and prevent it from ever quite reaching the target framerate represented by sv_fps, regardless of whether there is spare capacity or not. However, if server admins want them back as a crude form of cooperative multitasking, that would only be a small change to main().

I notice that your server was peaking at less than 20% CPU usage and a load average of around 1.5 (with daily spikes up to just over 2, probably a cron job) even with the increased CPU usage, which means all this seems somewhat academic for you: it's a long way from capacity. A load average of 1.5 means an average of 1.5 processes in "runnable" state (waiting for a CPU timeslice, as opposed to sleeping, waiting for network or waiting for disk I/O), and the usual Unix sysadmin rule of thumb is that you start to worry if the load-average exceeds the number of CPU cores for a significant time. Would you feel better if I described the new behaviour as "using idle capacity to maximize performance"? That's just another way to spell "higher CPU use" :-)

smcv commented 8 years ago

Please try this patch: https://github.com/smcv/OpenJK/commit/e1ee901de6555f157d9ff1fe1e64e01a8adf2141

If you just make the code change, it should continue to behave like current git master; if you also set com_dedicated_sleep 5, it should hopefully behave like versions from before #695 was merged.

Caelish commented 8 years ago

Sorry for the slow reply, a friend of mine had some surgery so I had non-work-y things to attend to.

@smcv thanks for the reply, that's actually incredibly enlightening. I wasn't aware the 5ms pauses were that hacky, or that they prevented sv_fps ever being reached.

You're right it's mostly academic to me at the moment, but additionally, I'm looking into downgrading CPUs in the near future, at which point it'd become much more of a problem. I'm kind of in a weird situation with this machine where said sysadmin rule of thumb does not apply - this same box has to spontaneously filter up to several millions of packets per second worth of DDoS traffic on occasion, which does interesting things to server load. People seem to have figured out those aren't working on my servers, and DDoS attacks have gone down significantly in the past months, but smaller DDoS attacks (few hundred thousand PPS) still happen several times a month.

Your "using idle capacity to maximize performance" is much more marketing buzzwordy and makes me feel much better about all this. That said, it also implies if less idle CPU power is available (say we're on a completely packed single-core server for whatever godforsaken reason) it won't use the same amount of CPU, sort of (but not really) like how excess RAM is usually used for disk caching. Is that the case or not really?

All in all, based off that post, it seems like this is an increase in CPU usage, but not at all negative or a regression - on the contrary. I can try out that patch if you like, but I'm perfectly happy keeping the newer behavior understanding that, although I imagine there may be some people with crappy VPS providers feeling otherwise. Even they should be mostly okay though.

All in all, I can try that patch if you like, but from the sound of it, I don't think I actually want to implement the patch so much as hold onto the new behavior now that I properly understand it. I'm absolutely willing to take an increase in CPU usage for a decrease in latency.

smcv commented 8 years ago

I can try that patch if you like, but from the sound of it, I don't think I actually want to implement the patch so much as hold onto the new behavior now that I properly understand it.

Would you mind trying it out short-term, even if you aren't going to keep it? It would confirm whether my reasoning is correct, and then the maintainers can make a more informed decision on whether that change is one that they want.

If I'm right, then the patch + com_dedicated_sleep 0 (default) should be equivalent to current git (higher CPU use than last month), and the patch + com_dedicated sleep 5 should be equivalent to how it was a month ago before #695 got merged.

I wasn't aware the 5ms pauses were that hacky, or that they prevented sv_fps ever being reached.

I still think they were basically a hack, but now I think about it again, the answer to whether they prevent sv_fps from being reached is "it's complicated". :-) I'm saying "frame" here as shorthand for "frame-equivalent amount of running the game simulation" - the dedicated server obviously doesn't draw any actual frames, but the principles are similar to what would happen if it did.

If the server is fast/only lightly loaded and would have been pausing for 5ms or more before starting to work on the next frame anyway, then the 5ms sleep doesn't prevent it from hitting its deadline, it just delays processing of incoming network packets a little.

If the server is slow/loaded enough to have trouble achieving its sv_fps, then the 5ms sleep could push it over the edge into not keeping up.

That said, it also implies if less idle CPU power is available (say we're on a completely packed single-core server for whatever godforsaken reason) it won't use the same amount of CPU, sort of (but not really) like how excess RAM is usually used for disk caching.

That's actually a really good analogy. You have a certain amount of a resource (clock cycles or RAM), and the kernel's goal is to get as much work done as possible, given the constraint that it can't use more of the resource than you physically have. :-)

While a thread is waiting for an external event like disk I/O, network activity or time passing via select() or usleep() or similar, the kernel's scheduler marks it as "not runnable": there's no point in running that process, because it doesn't have anything to do. The CPU is idle if there are no runnable threads. When you have lots of spare CPU capacity, the kernel will wake threads up as soon as they become runnable: in our case that means when there are incoming network packets, or when it's time to start work on the next frame (as estimated by looking at the last few frames, seeing how long they took, and starting that far in advance of the "deadline").

If your system is at full CPU capacity - it's said to be "CPU-bound" rather than "I/O-bound" - then the scheduler can't give all threads what they want, and has to make some decisions. In our case that means some of the dedicated servers are marked as runnable (because network packets came in or select() or usleep() timers ran out), but aren't run immediately. Effectively, their select() or usleep() takes longer than it should have done. "Load average" is just the number of runnable-but-not-running threads, averaged over a few seconds. For instance, if you have 2 CPU cores, 3 runnable threads, and lots of idle threads, then your load is 1: you might have thread A running on CPU 1, thread B running on CPU 2, thread C waiting to be scheduled, and threads D to Z idle.

What the 5ms sleep thing effectively did was to make the OpenJK processes lie to the kernel, and tell it "there is no useful work I can be doing in the next 5ms", even though maybe they could have been working. That prevents those processes from being runnable, which makes the scheduler's job easier: it only has to choose between the remaining runnable processes.

The major problem with a high load is that the scheduler doesn't always make the decision you would want it to. For instance, suppose you decide your server is too busy, and log in over ssh to terminate some OpenJK instances. If the scheduler is giving too many CPU cycles to OpenJK, and not enough to ssh or your shell, then your shell will be unresponsive and you'll have difficulty terminating the unwanted instances. The scheduler tries to give a higher priority to running threads like your shell that spend most of their time waiting for input, because otherwise they could get "starved" by CPU-bound threads, so it will usually let your shell run reasonably promptly, but this heuristic doesn't always work. This is where things like renice come in: if OpenJK has a high "nice value" then the scheduler will choose it less often, letting your shell take precedence.

Depending on the exact state of the system, the 5ms pauses might make your shell more responsive by ensuring that there's at least some time during which each OpenJK thread is idle, or they might actually make matters worse by tricking the scheduler into thinking that OpenJK is less CPU-bound than it actually is.

Another issue with having no spare CPU time is that you might prefer not to use all the clock cycles for extended periods of time, because if you do the CPU gets hot and/or uses more power, particularly if it has load-dependent clock speed adjustment like laptops do. However, IMO the way to fix that is to reduce what you're asking for - run fewer processes or reduce their com_maxfps (clients) or sv_fps (servers) until they fit in whatever CPU budget you want to give them, and/or use cgroups to give them an explicit CPU budget that they can't exceed. The purpose of #695, if you think about it in those terms, was to connect a reduction in com_maxfps to a reduction in CPU time used: previously, a low sv_fps caused the dedicated server to become idle more often, but even with a low com_maxfps the client was runnable all the time (either doing useful work, or busy-waiting in a tight loop), except for during those arbitrary 5ms pauses.

smcv commented 8 years ago

If people are happy with my reasoning for why this isn't really a bug, please could someone close it?

Razish commented 8 years ago

I'm happy with it, and I have better ideas on solving #826 Thanks for the effort and detailed explanations.

JACoders / OpenJK

Openjkded CPU use regression since 2016-03-21 build #823