Refactorio / RedMew

The RedMew scenario code for Factorio.
https://redmew.com
GNU General Public License v3.0
177 stars 80 forks source link

Document best practices in regards of performance #468

Closed linaori closed 5 years ago

linaori commented 5 years ago

We can get a lot of quick wins performance wise in our code-base by following the rules on https://springrts.com/wiki/Lua_Performance. Please refer to this document when reading the suggestions below

TEST 1 This is about localizing references. By default everything is reached from the global scope and often from a table. Example:

-- slow, avoid
math.min()

-- top of the file
local min = math.min

-- somewhere in your code
min(a, b)

TEST 4 This refers to the lua function math.max. Accepts the highest value of 2. It's faster to manually check which is higher and assign that, compared to the function call. It doesn't matter for code that's ran everyone once in a while, but if it's performance critical code (hot path, often executed), it's wise to apply this optimization.

TEST 8 This test is to show that function definitions inside a loop, or scope, executed many times, costs a lot of overhead. Whenever possible, move your function definition someplace where defining it is only done once.

-- slow
local function test()
    local testf = function () end
    -- or even
    for i = 1, 100 do
        (function())
    end
end

-- instead do
local testf = function () end

local function test()
    for i = 1, 100 do
        testf()
    end
end

TEST 9 This is test is really important for sequential tables (so starting at 1 and always adding 1 to the index). When iterating over the table, calling pairs or ipairs is really slow compared to keeping track of the table size and accessing it +1 when iterating. This only works if the table is sequential!

-- slow
for j,v in ipairs(a) do
  x=v
end

-- fast
for i=1,#a do
  x=a[i]
end

TEST 11 This is a generic solution for pretty much anything. Whenever you have to access a certain key multiple times, reference it in a local. In case of tabels, it's always a reference, changing the reference, changes the original as well, it's not a copy.

local create_entity = game.surfaces.nauvis.create_entity

for i = 1, 100 do
    create_entity(...)
end

-- and
local my_table = {
    some_other_table = {
        counter = 0,
    }
}

-- slow
my_table.some_other_table.counter = my_table.some_other_table.counter  + 1
my_table.some_other_table.counter = my_table.some_other_table.counter  + 1

-- faster
local some_other_table = my_table.some_other_table
local counter = some_other_table.counter
counter = counter + 1
counter = counter + 1
some_other_table.counter = counter

TEST 12: table insert When inserting items into a table, it's common to use table.insert, but this is extremely slow. If you're in a loop, keep a counter and directly set it with: my_table[current_counter]. If you want to append an item to a table without a loop, use: my_table[#my_table + 1]

TEST 12: table create When creating a table, lua does some smart memory allocating as it knows ahead of time what the size should be. It's wise to avoid using custom indexes, but it's also important to combine the creation of the array and setting the values. If you know ahead of time what the size will be and this is static, reserve some slots.

-- slow
local my_table = {}
my_table[1] = 'foo'
my_table[2] = 'bar'
my_table[3] = 'baz'

-- faster
local my_table = {true, true, true}
my_table[1] = 'foo'
my_table[2] = 'bar'
my_table[3] = 'baz'

-- fastest
local my_table = {'foo', 'bar', 'baz'}

TEST 13 Lua doesn't do well with static analyzation. Whenever it sees a table structure, it will create and insert that table structure. Meaning if you have a table definition inside a loop or function callback, it will create, initialize, and store this table in memory every single iteration. This also triggers heavy garbage collection. Note that this is only useful for static code, something that is not altered, like a color definition in factorio: {r = 255, g = 200, b = 100}. A side-effect of this is that the table is never copied, it's always a reference, so make sure it's not modified.

-- slow
local my_table = {}
for i = 1, 100 do
    my_table[i] = {r = 255, g = 200, b = 100} -- created and stored in memory, then garbage collected
end

-- fast
local my_table = {}
local color = {r = 255, g = 200, b = 100}
for i = 1, 100 do
    my_table[i] = color
end
mheguy commented 5 years ago

This is all cool to know and good for us all to be aware of.

Just so this doesn't get lost on anyone though:

Readability and maintainability are in most cases just as important, and optimizing code for every last ounce of performance can severely impact those qualities. On the other hand, some of the optimizations suggested have little bearing on readability and should generally always be applied, e.g. localization of API functions, or actually make for neater code e.g. the use of or rather than a nil-check. Generally speaking, optimize only once you are sure that there is or will be a performance bottleneck.

So for something like diggy's stress calculations (which I understand to be quite computationally intensive) these rules are important. But for something that will run 1000 times or less on a map: even a 1000% performance increase is not worth making code harder to read or maintain.

linaori commented 5 years ago

It's primarily for hot path optimizations.

Compiler hot paths are code execution paths in the compiler in which most of the execution time is spent, and which are potentially executed very often.

Most of the ones I listed here are free wins at pretty much no cost of readability (at least in my opinion)

mheguy commented 5 years ago

Yep, we're saying the same thing. 😄

Valansch commented 5 years ago

I edited Test 9 to make it ipairs instead of pairs.

linaori commented 5 years ago

@Valansch I just copy-pasted that one from the examples 😅

What shall we do about this, add the contents of the issue to the wiki? If you still want to replace the docs with the wiki that is.

mheguy commented 5 years ago

The wiki makes much more sense from a long-term storage perspective. Feel free to edit or change the name or anything, I just copy pasted the contents of OP and made a quick title.

https://github.com/Refactorio/RedMew/wiki/Performance:-Best-practices

mheguy commented 5 years ago

After adding the contents of OP to the wiki I found myself in a situation where I wanted to optimize table appending so wanted to check what kind of performance gains I would get. My testing showed inconsistent benefits. In particular if a table were inside the global table it was more performant to use insert, however with a local table using [#table + 1] was more performant. It should be noted the difference in speed was 2% with local tables, and the difference when using global tables was 5%. I don't think this warrants being promoted as a practice one way or another.

For global tables, each test was run 3 times and the game was restarted after each method was tested. (Results listed in the top half of the code block) For local tables each test was run 3 times, all inside the same game. (Results listed in the bottom half of the code block) Only the median result was kept of the 3 runs:

4974 ms
global.mytable = {}
local insert = table.insert
for i = 1, 5000000 do
    insert(global.mytable, 1)
end

5205 ms
global.mytable = {}
for i = 1, 5000000 do
    global.mytable[#global.mytable + 1] = 1
end

--------

4740 ms
local mytable = {}
local insert = table.insert
for i = 1, 5000000 do
    insert(mytable, 1)
end

4643 ms
local mytable = {}
for i = 1, 5000000 do
    mytable[#mytable + 1] = 1
end

I made a note in the wiki entry to reflect this.

Edit: cleaned up the structure of the data and rewrote a bit to be more clear Edit2: More cleanup, removed all but the fastest test methods.

linaori commented 5 years ago

don't think this warrants being promoted as a practice one way or another.

They are the kind of micro optimizations you're looking at when calling the code thousands or millions of time during a game. They are not interesting when you're iterating over something every once in a while and appending a bunch of things.

The reason global tables are showing a difference between insert and counts compared to normal tables, is because every time you count in a global table, it has to fetch it from the global, which is causing the performance overhead. It's always faster to use locals for that.

mheguy commented 5 years ago

They are the kind of micro optimizations

But that's my point, your premise in this case is not established. Per your source [#table + 1] vs insert should show a 36% difference but that's not what my testing showed at all.

mheguy commented 5 years ago

May as well re-open, since there's discussion.

mheguy commented 5 years ago

My overall point was:

If you're appending many elements to a global table, insert(global.mytable, 1) gives 4.5% faster results than global.mytable[#global.mytable + 1] = 1.

If you're appending many elements to a local table, mytable[#mytable + 1] = 1 gives 2% faster results than insert(mytable, 1)

So the assertion

When inserting items into a table, it's common to use table.insert, but this is extremely slow. If you're in a loop, keep a counter and directly set it with: my_table[current_counter].

Doesn't seem valid as a general rule.

Ultimately, the wiki entry can be edited freely so if you wish to add it back, feel free.

mheguy commented 5 years ago

TEST 12: table insert When inserting items into a table, it's common to use table.insert, but this is extremely slow. If you're in a loop, keep a counter and directly set it with: my_table[current_counter]. If you want to append an item to a table without a loop, use: my_table[#my_table + 1]

In talking to @iltar on discord 2 things were noted. 1: That we weren't on the same page about creating a table vs. appending to one. 2: That I misread the instructions/misinterpreted current_counter.

Creating/initially populating a table and appending to one are very different, so I'll address them individually.

Creating a table: I never disputed the suggestion of using a counter for the creation of a table. It's significantly faster than other methods.

1483 ms -- using i as the counter
global.mytable = {}
for i = 1, 5000000 do
    global.mytable[i] = 1
end
----
3702 ms -- using insert
global.mytable = {}
local insert = table.insert
for i = 1, 5000000 do
    insert(global.mytable, 1)
end
---
3885 ms -- counting the table each time
global.mytable = {}
for i = 1, 5000000 do
    global.mytable[#global.mytable + 1] = 1
end

Appending to a table: this is what I disputed, because I misinterpreted the instructions.

I dismissed the counter method, because you can't append to a table if your counter starts at 1.

414 ms -- very fast, but it's just overwriting entries 1-5000000 and not appending
for i = 1, 5000000 do
    global.mytable[i] = 1
end

What I should have been doing was creating a counter before the loop and initializing it

1539 ms -- using a counter variable initialized with the count of the table
local counter = #global.mytable
for i = 1, 5000000 do
    global.mytable[counter + 1] = 1
    counter = counter + 1
end
----
3702 ms -- using insert
local insert = table.insert
for i = 1, 5000000 do
    insert(global.mytable, 1)
end
---
3885 ms -- counting the table each time
for i = 1, 5000000 do
    global.mytable[#global.mytable + 1] = 1
end

So my mistake was not thinking of counter as a variable initialized outside of the loop.