Closed linaori closed 5 years ago
This is all cool to know and good for us all to be aware of.
Just so this doesn't get lost on anyone though:
Readability and maintainability are in most cases just as important, and optimizing code for every last ounce of performance can severely impact those qualities. On the other hand, some of the optimizations suggested have little bearing on readability and should generally always be applied, e.g. localization of API functions, or actually make for neater code e.g. the use of or rather than a nil-check. Generally speaking, optimize only once you are sure that there is or will be a performance bottleneck.
So for something like diggy's stress calculations (which I understand to be quite computationally intensive) these rules are important. But for something that will run 1000 times or less on a map: even a 1000% performance increase is not worth making code harder to read or maintain.
It's primarily for hot path optimizations.
Compiler hot paths are code execution paths in the compiler in which most of the execution time is spent, and which are potentially executed very often.
Most of the ones I listed here are free wins at pretty much no cost of readability (at least in my opinion)
Yep, we're saying the same thing. 😄
I edited Test 9 to make it ipairs instead of pairs.
@Valansch I just copy-pasted that one from the examples 😅
What shall we do about this, add the contents of the issue to the wiki? If you still want to replace the docs with the wiki that is.
The wiki makes much more sense from a long-term storage perspective. Feel free to edit or change the name or anything, I just copy pasted the contents of OP and made a quick title.
https://github.com/Refactorio/RedMew/wiki/Performance:-Best-practices
After adding the contents of OP to the wiki I found myself in a situation where I wanted to optimize table appending so wanted to check what kind of performance gains I would get. My testing showed inconsistent benefits. In particular if a table were inside the global
table it was more performant to use insert, however with a local table using [#table + 1]
was more performant. It should be noted the difference in speed was 2% with local tables, and the difference when using global tables was 5%. I don't think this warrants being promoted as a practice one way or another.
For global tables, each test was run 3 times and the game was restarted after each method was tested. (Results listed in the top half of the code block) For local tables each test was run 3 times, all inside the same game. (Results listed in the bottom half of the code block) Only the median result was kept of the 3 runs:
4974 ms
global.mytable = {}
local insert = table.insert
for i = 1, 5000000 do
insert(global.mytable, 1)
end
5205 ms
global.mytable = {}
for i = 1, 5000000 do
global.mytable[#global.mytable + 1] = 1
end
--------
4740 ms
local mytable = {}
local insert = table.insert
for i = 1, 5000000 do
insert(mytable, 1)
end
4643 ms
local mytable = {}
for i = 1, 5000000 do
mytable[#mytable + 1] = 1
end
I made a note in the wiki entry to reflect this.
Edit: cleaned up the structure of the data and rewrote a bit to be more clear Edit2: More cleanup, removed all but the fastest test methods.
don't think this warrants being promoted as a practice one way or another.
They are the kind of micro optimizations you're looking at when calling the code thousands or millions of time during a game. They are not interesting when you're iterating over something every once in a while and appending a bunch of things.
The reason global tables are showing a difference between insert and counts compared to normal tables, is because every time you count in a global table, it has to fetch it from the global, which is causing the performance overhead. It's always faster to use locals for that.
They are the kind of micro optimizations
But that's my point, your premise in this case is not established. Per your source [#table + 1]
vs insert
should show a 36% difference but that's not what my testing showed at all.
May as well re-open, since there's discussion.
My overall point was:
If you're appending many elements to a global table, insert(global.mytable, 1)
gives 4.5% faster results than global.mytable[#global.mytable + 1] = 1
.
If you're appending many elements to a local table, mytable[#mytable + 1] = 1
gives 2% faster results than insert(mytable, 1)
So the assertion
When inserting items into a table, it's common to use table.insert, but this is extremely slow. If you're in a loop, keep a counter and directly set it with: my_table[current_counter].
Doesn't seem valid as a general rule.
Ultimately, the wiki entry can be edited freely so if you wish to add it back, feel free.
TEST 12: table insert When inserting items into a table, it's common to use
table.insert
, but this is extremely slow. If you're in a loop, keep a counter and directly set it with:my_table[current_counter]
. If you want to append an item to a table without a loop, use:my_table[#my_table + 1]
In talking to @iltar on discord 2 things were noted.
1: That we weren't on the same page about creating a table vs. appending to one.
2: That I misread the instructions/misinterpreted current_counter
.
Creating/initially populating a table and appending to one are very different, so I'll address them individually.
Creating a table: I never disputed the suggestion of using a counter for the creation of a table. It's significantly faster than other methods.
1483 ms -- using i as the counter
global.mytable = {}
for i = 1, 5000000 do
global.mytable[i] = 1
end
----
3702 ms -- using insert
global.mytable = {}
local insert = table.insert
for i = 1, 5000000 do
insert(global.mytable, 1)
end
---
3885 ms -- counting the table each time
global.mytable = {}
for i = 1, 5000000 do
global.mytable[#global.mytable + 1] = 1
end
Appending to a table: this is what I disputed, because I misinterpreted the instructions.
I dismissed the counter method, because you can't append to a table if your counter starts at 1.
414 ms -- very fast, but it's just overwriting entries 1-5000000 and not appending
for i = 1, 5000000 do
global.mytable[i] = 1
end
What I should have been doing was creating a counter before the loop and initializing it
1539 ms -- using a counter variable initialized with the count of the table
local counter = #global.mytable
for i = 1, 5000000 do
global.mytable[counter + 1] = 1
counter = counter + 1
end
----
3702 ms -- using insert
local insert = table.insert
for i = 1, 5000000 do
insert(global.mytable, 1)
end
---
3885 ms -- counting the table each time
for i = 1, 5000000 do
global.mytable[#global.mytable + 1] = 1
end
So my mistake was not thinking of counter
as a variable initialized outside of the loop.
We can get a lot of quick wins performance wise in our code-base by following the rules on https://springrts.com/wiki/Lua_Performance. Please refer to this document when reading the suggestions below
TEST 1 This is about localizing references. By default everything is reached from the global scope and often from a table. Example:
TEST 4 This refers to the lua function
math.max
. Accepts the highest value of 2. It's faster to manually check which is higher and assign that, compared to the function call. It doesn't matter for code that's ran everyone once in a while, but if it's performance critical code (hot path, often executed), it's wise to apply this optimization.TEST 8 This test is to show that function definitions inside a loop, or scope, executed many times, costs a lot of overhead. Whenever possible, move your function definition someplace where defining it is only done once.
TEST 9 This is test is really important for sequential tables (so starting at 1 and always adding 1 to the index). When iterating over the table, calling pairs or ipairs is really slow compared to keeping track of the table size and accessing it +1 when iterating. This only works if the table is sequential!
TEST 11 This is a generic solution for pretty much anything. Whenever you have to access a certain key multiple times, reference it in a local. In case of tabels, it's always a reference, changing the reference, changes the original as well, it's not a copy.
TEST 12: table insert When inserting items into a table, it's common to use
table.insert
, but this is extremely slow. If you're in a loop, keep a counter and directly set it with:my_table[current_counter]
. If you want to append an item to a table without a loop, use:my_table[#my_table + 1]
TEST 12: table create When creating a table, lua does some smart memory allocating as it knows ahead of time what the size should be. It's wise to avoid using custom indexes, but it's also important to combine the creation of the array and setting the values. If you know ahead of time what the size will be and this is static, reserve some slots.
TEST 13 Lua doesn't do well with static analyzation. Whenever it sees a table structure, it will create and insert that table structure. Meaning if you have a table definition inside a loop or function callback, it will create, initialize, and store this table in memory every single iteration. This also triggers heavy garbage collection. Note that this is only useful for static code, something that is not altered, like a color definition in factorio:
{r = 255, g = 200, b = 100}
. A side-effect of this is that the table is never copied, it's always a reference, so make sure it's not modified.