Archaegeo / DualUniverseLuaIssues

DualUniverse LUA Issue Tracking
GNU General Public License v3.0
5 stars 0 forks source link

Document the rules for processor overload, and provide diagnostics #27

Open samdeane opened 3 years ago

samdeane commented 3 years ago

I have a script which occasionally overloads. It is a complex script with a lot of lua objects, and it's not at all clear why it is overloading.

It would be very helpful to have the rules clearly documented somewhere. It would be even more useful if some diagnostic output was printed to the console saying why the overload occurred, and giving a stack trace.

Dimencia commented 3 years ago

In this same vein, like the new screens, it would be very nice to have API functions like .getOverloadProgress and .getOverloadMax, just some functions that can report how close it is to overloading

samdeane commented 3 years ago

Slightly tangentially, I've also read speculation that overload is tied to a count of Lua function calls, rather than some time-based metric.

I really hope that this isn't true. My code is quite structured and object oriented, which inevitably leads to a whole bunch of trivial function calls and property accesses.

Obviously I'm aware that there's a performance penalty associated with that, and I'm happy to work within performance limitations, but it should be my choice how I structure the code. If I really have to optimise some actual hotspots that are too slow, I can do it at the expense of clarity, but that's very different from optimising to reduce a call count (with no account taken of the actual expense of the calls).

I'm sure as hell not going to just start writing everything as monolithic lumps!

EasternGamer commented 3 years ago

It's purely tied to byte-code generation per "tick". If it reaches a threshold, it overloads, or so I've been told.

Generally, the best way to solve this is by reducing the number of function calls, using local functions when possible, avoiding external computational libraries like quat, matrix4, vec3, etc.

Also, from what I've heard, it's not really something you can "read", and based on what someone showed me in Lua, there is something in Lua that allows this with the script itself. So, it isn't something you can probably control and trace.

However, if NQ are using a different method, then having this would be great.

samdeane commented 3 years ago

Lua has some hooks for profiling/tracing/etc, and it's likely using those - which is why I think people believe it is tied to function calls.

Generally, the best way to solve this is by reducing the number of function calls, using local functions when possible, avoiding external computational libraries like quat, matrix4, vec3, etc.

Indeed. However, it really sucks if the overload system is pushing people towards unstructured code, for no real reason. What should matter is the actual resource impact of the script, not an arbitrary count of calls.

samdeane commented 3 years ago

To be honest, the whole concept of the overload is a bit dumb. A better concept would be throttling, where the script keeps going but is given less time to run if it exceeds certain limits.

Dimencia commented 3 years ago

And the current overload system can cause some real issues; some of the API function calls are written in C, and so require basically no 'function calls' from lua. Those functions (like logging to console) can take a very long time to execute, but use few function calls, so if you're spamming console chat, you can make your game lag because it's waiting for the lua to catch up.
Which is bad and probably related to lag-bombs. The lua should never be able to take longer than a game-tick to run, I believe that's what overloads are there to catch in the first place

Ideally, overload should be based on execution time (but note that, this is still pushing people towards unstructured code, unstructured code is just faster in lua). Basically, if lua from the previous game-tick is still running when it's time for a new tick, it should be an overload

But that's asking for a lot. Just an indicator of overload progress would be fine, if possible (but it's kinda not if they're using the debug hooks we suspect)

EasternGamer commented 3 years ago

Overload isn't based off "function calls", but the byte code instruction generation. So, when you call a function that accesses a C function, the entire overhead generated is entirely from accessing the function (the byte code) and passing the variables through.

Some functions are insanely long in some instances, like radar.getData() in some areas in a single line can take 20ms. The byte code generation was probably more like 5 bits of code.

In some ways, yes, Lua should be limited based off the tick time, however, if you did that, ships with a large amount of elements just would always overload because one function call causes a huge amount of lag in flush, sadly.

samdeane commented 3 years ago

Do you mean it's based on "compiled" (I use the term loosely) byte code size?

Presumably Lua uses some sort of JIT byte code generation and caches it? So even byte code size seems a bit of a weird metric. It also doesn't in any way explain why my scripts sometimes overload after a random amount of time (measure in tens of minutes).

The way my scripts overload suggests that the criteria has a time-based component, and there's an occasional code path that causes a spike in whatever is being measured.

samdeane commented 3 years ago

If they have to have the concept of a time-based overload, I'd have thought that the way to do it would be to limit Lua execution to a single dedicated thread, and then just monitor the time spent in that thread.

As I say though, I'd prefer a throttling system which adjusted the frequency of timers if the script(s) were using too many resources.

Though in fact what I'd really prefer is something like this. Give scripts a "fuel", then just unshackle them. That way it just becomes another resource-management issue for the player to balance.

EasternGamer commented 3 years ago

Do you mean it's based on "compiled" (I use the term loosely) byte code size?

Presumably Lua uses some sort of JIT byte code generation and caches it? So even byte code size seems a bit of a weird metric. It also doesn't in any way explain why my scripts sometimes overload after a random amount of time (measure in tens of minutes).

The way my scripts overload suggests that the criteria has a time-based component, and there's an occasional code path that causes a spike in whatever is being measured.

Things are dynamic in nature in DU. So if you're just on the edge of an overload due to the bytecode generation, then it would be fair to say that one small thing that doesn't trigger often causes the overload. An example would be radar decoding. If you're in a 2000 construct area, if you don't yield, you will overload. Now let's say, depending on the construct size, you also do additional processing. You may yield every 100 constructs, but what if 50 of those constructs suddenly met your condition and caused an overload.

It depends on the script.

I'm not really gonna say I don't want this... I do. I would love this. It has been my observation that overload occurs from bytecode generation. Not time. And not function calls. (Otherwise a single for loop will never cause an overload if it just incremented a number)

samdeane commented 3 years ago

I'm not really gonna say I don't want this... I do. I would love this. It has been my observation that overload occurs from bytecode generation. Not time. And not function calls. (Otherwise a single for loop will never cause an overload if it just incremented a number)

I think maybe we mean slightly different things by "bytecode generation". A single for loop incrementing a number ought to be translated into a few bytecodes which are then iterated over in a tight loop by the VM, so I wouldn't expect that to cause an overload due to bytecode generation either!

Do you know for a fact that this is the overload mechanism, or are you speculating?

samdeane commented 3 years ago

(slightly tangentially, this is an interesting read: https://www.lua.org/doc/jucs05.pdf)

EasternGamer commented 3 years ago

I'm not really gonna say I don't want this... I do. I would love this. It has been my observation that overload occurs from bytecode generation. Not time. And not function calls. (Otherwise a single for loop will never cause an overload if it just incremented a number)

I think maybe we mean slightly different things by "bytecode generation". A single for loop incrementing a number ought to be translated into a few bytecodes which are then iterated over in a tight loop by the VM, so I wouldn't expect that to cause an overload due to bytecode generation either!

Do you know for a fact that this is the overload mechanism, or are you speculating?

Well, it's not time, it's not functions, and it does overload in that situation, so I assume it would be "bytecode", but by Lua's definition of it in terms of processing. I would assume it is the debug hooks. If it expanded the bytecode generation... Each number incrementing is, in the end, an instruction being performed. But yes, I don't know for certain. It could be anything.

mayumi7 commented 2 years ago

It's most likely implemented using lua_sethook with LUA_MASKCOUNT as the argument. It's described like this in the docs (https://www.lua.org/manual/5.3/manual.html) "The count hook: is called after the interpreter executes every count instructions."

samdeane commented 2 years ago

Yeah, that's what most people think, but it needs confirming so that we actually know.