Closed vdweller84 closed 1 year ago
I looked at my tests from 2019 and it looks like I was able to do around 2200 calls from C++ to lua every millisecond.
The function I ran in lua in the test was the following, where m=5
. I was using Lua 5.1. I was not using Luajit, so I expect luajit would give a better result possibly. On the face of it it seems that maybe I have a lot more calls than you do - but I cannot know what your setup is exactly.
local function Fibonacci ( m )
if m < 2 then
return m
end
return Fibonacci (m -1) + Fibonacci (m -2)
end
Maybe you should make tests where you benchmark how long it takes to call an empty function for example using your setup. Try find a minimal example of what is slow and find out what part of it is slow. In my setup I run the lua scripts once and they define functions. I then call on events that call the specific functions that are alraedy defined in lua. If you re-evaluate your script each time, maybe that is a bottleneck? Hard to know what is going on without code.
And on properties, there is a documented overhead when using any kind of non-function access using sol. You should always use only functions to do things when doing performance critical things. Behind the scenes everything is always a function. See this.
Note that performance for member function calls goes down by a fixed overhead if you also bind variables as well as member functions. This is purely a limitation of the Lua implementation and there is, unfortunately, nothing that can be done about it. If you bind only functions and no variables, however, sol will automatically optimize the Lua runtime and give you the maximum performance possible. Please consider ease of use and maintenance of code before you make everything into functions.
Well, the main performance killer in my case is the following:
Suppose you have a C++ struct:
struct Instance {
float x;
float y;
//blah...
sol::table table; //this stores most of the struct's variables for use in lua
};
In C++, you store all these instances in some place, eg in a vector or whatever. Your C++ has to use data from these instances for spatial partitioning, rendering etc. It is not viable to just get these values from lua.
Now, at some point, an instance's lua script runs:
x = x + 1
When a value is changed, a table is consulted on whether this value update also has to make changes in C++ data:
local propertyFuncSet = {
x = SetX,
y = SetY,
etc etc
}
(relevant part inside metatable):
local func = propertyFuncSet[k]
if func then
func(actualT.id, v)
end
So when x is changed, lua also calls `SetX()`, which does a few things in C++:
void InstanceSetX(InstanceID instanceID, PositionType x) {
Room* room = currentRoomPtr;
InstanceIndex index = room->mapInstIDIndex[instanceID];
Float2<PositionType>& cp = room->vecInstPos[index];
if (cp.x != x) {
cp.x = x;
InstanceAddToCollisionUpdateSet(instanceID);
InstanceAddToSpatialUpdateSet(instanceID);
}
}
This C++ - lua - C++ round trip causes the most trouble - and this trouble is not alleviated by userdata/usertypes by any stretch - in fact, things are even slower then.
So I guess a question is: Can I do something to make this part much more faster? I am setting these C++ functions up like this:
lua["SetX"] = FNCDEF(XEROOM::InstanceSetX);
where FNCDEF
is
#define FNCDEF(x) sol::c_call<sol::wrap<decltype(&x), &x>>
(shamelessly stolen)
Is there anything I can do to improve upon this scheme? All these C++ functions that lua calls have 2 arguments (numbers). A stack trick? Something else?
EDIT:: You may be tempted to think that what happens inside the InstanceSetX()
function is what hampers speed. But commenting out both InstanceAddToCollisionUpdateSet(instanceID)
and InstanceAddToSpatialUpdateSet(instanceID)
shows they are largely inconsequential.
A follow-up for the future reader: Using the plain lua C api in many cases helps quite a bit with performance. To give 2 examples:
int InstanceSetX(lua_State* L) {
InstanceID instanceID = (InstanceID)lua_tonumber(L, 1);
PositionType x = (PositionType)lua_tonumber(L, 2);
lua_pop(L, 2);
Room* room = currentRoomPtr;
InstanceIndex index = room->mapInstIDIndex[instanceID];
Float2<PositionType>& cp = room->vecInstPos[index];
if (cp.x != x) {
cp.x = x;
InstanceAddToCollisionUpdateSet(instanceID);
InstanceAddToSpatialUpdateSet(instanceID);
}
return 0;
}
bool RunEventFullHierarchy(std::vector<LuaFuncRef>& vec, InstanceID ID, InstanceIndex index) {
size_t size = vec.size();
if (size > 0) {
lua_State* L = XELUASTATE::lua.lua_state();
XEROOM::SetScopeInternal(ID, index);
for (size_t i = 0; i < size; ++i) {
lua_rawgeti(L, LUA_REGISTRYINDEX, vec[i]);
lua_pcall(L, 0, 0, LUA_ERRFUNC_STACKINDEX);
}
XEROOM::EXPOSED::ResetScope();
return true;
}
return false;
}
Especially using the lua registry to store references (in the above case functions) can net big gains. So there is space for improvement.
Hi all,
Apologies for the sheet of words. I am very very stumped on a major performance bottleneck and would like some input on whether I am a complete moron or I've hit some kind of natural barrier.
I am making a 2D game engine(original!) with objects that run lua scripts when they are created, or every step of the game, etc. I am using luajit.
Instances have their own "built-in" variables like "x", "y", "spriteID", but users can assign new variables to them at runtime. Instances must somehow communicate their "built-in" state to C++ because the engine processes spatial hashing, graphics rendering etc.
I want people to be able to write easy-to-understand code in a lua script. So let's say you have
obj_cat
(a @ThePhD favorite example) instance that runs anevent_step
lua script every step. All of the above should be valid:x = x + 1 --valid: The global table is rigged so that a "current instance"'s variables are edited
me.x = me.x + 1 --same here, "me" is the current instance
The issue here is, I made all these thigs work, and it's great. It's also pretty slow.
The first strategy I came up with to implement this functionality was...
Take 1 Make each instance a lua table. Run an init script in the beginning that rigs things like this:
So the strategy is the following:
So it's mostly tables, with some duplicate data and metatable shenanigans. It works fine, in pronciple! Really! But! On my Intel i5-10300H laptop, 5000 instances moving and shooting bullets, I get ~70 fps. That's... not very good. Half of that runtime is consumed in C++/lua/C++ round trips. Drawing all that stuff is like 3% of the total runtime.
Take 2
Let's try userdata/usertypes! Now all instance info is stored in a C++ class, and I define this usertype:
Where Set and Get are:
The usual scoping methods are applied here too. But instead of tables, C++ sends/receives pointers.
Despite having high hopes for this approach, it runs almost 50% slower than the first method. Well shit. Needless to say, I am stumped for good. So much for implementing bindless graphics/data oriented layouts and what have you.
My question here is whether I am missing something critical here from an architectural standpoint. Have I hit the natural bottom-line barrier between lua/C++ communication, or am I wasting too much time with the way I have designed things? Maybe the speed I get is actually pretty good for a C++/luajit program? Maybe I'm losing it?
One suggestion I wanted to make is whether it is possible to have a "mixed type" sol::property that allows for both member variables and functions, just to avoid calling functions on everything (I don't know, maybe functions are being called anyway behind the scenes). But again, even with this, I wouldn't expect any substantial gains.
Trying tricks like not #defining
SOL_ALL_SAFETIES_ON
maybe shaves of a few microseconds. I've also heard of separate environments per script, but can't see how they can offer any substantial speed gains.I would really appreciate some insight on this. I'm willing to try any arcane trick/redesign/non-sentient creature sacrifice to make this work.