dokutan / bf2lua

Brainfuck to Lua transpiler
MIT License
0 stars 0 forks source link

Benchmark #2

Open ExtReMLapin opened 1 year ago

ExtReMLapin commented 1 year ago

Hello, Interesting project, did you run benchmarks, for example on mandelbrot ?

Here is my transpiler https://github.com/ExtReMLapin/fast_brainfuck.lua

ExtReMLapin commented 1 year ago

Here is the benchmark, i'm taking the bet the difference is from the datastruct used and the % everywhere

PS > Measure-Command { .\mingw64\luajit.exe .\mandel_fast.lua}

Days : 0 Hours : 0 Minutes : 0 Seconds : 2 Milliseconds : 206 Ticks : 22065291 TotalDays : 2,553853125E-05 TotalHours : 0,00061292475 TotalMinutes : 0,036775485 TotalSeconds : 2,2065291 TotalMilliseconds : 2206,5291

PS > Measure-Command { .\mingw64\luajit.exe .\mandel.lua }

Days : 0 Hours : 0 Minutes : 0 Seconds : 5 Milliseconds : 139 Ticks : 51396523 TotalDays : 5,94867164351852E-05 TotalHours : 0,00142768119444444 TotalMinutes : 0,0856608716666667 TotalSeconds : 5,1396523 TotalMilliseconds : 5139,6523

_fast version is mine mandel.lua.txt mandel_fast.lua.txt

ExtReMLapin commented 1 year ago

And with the removed % and added

local data = {}
local ptr = 1

if type(rawget(_G, "jit")) == 'table' then
    ffi = require("ffi")
    data = ffi.new("char[32768]")
    jit.opt.start("loopunroll=100")
    ffi_fill = ffi.fill
else
    data = {}
    local i = 0

    while i < 32768 do
        data[i] = 0
        i = i + 1
    end
end

It beats mine at 1986ms !

dokutan commented 1 year ago

Interesting considering i wrote this without really caring about performance, my goals were compatibility and primitive debugging support. I tried to recreate your results on my machine (Ryzen 5950X, Arch Linux with kernel version 6.4.3):

Your mandel_fast.lua:

time luajit mandel_fast.lua
#  1.40s user 0.02s system 99% cpu 1.428 total

My transpiler with default options:

./bf2.lua -i mandel.b -o mandel.lua
time luajit mandel.lua
#  3.62s user 0.02s system 99% cpu 3.641 total

My transpiler with a function for each loop:

./bf2.lua -i mandel.b -o mandel.lua -f
time luajit mandel.lua
#  68.46s user 0.01s system 99% cpu 1:08.52 total

My transpiler with maximum optimizations:

./bf2.lua -i mandel.b -o mandel.lua -O 2
time luajit mandel.lua
#  2.92s user 0.02s system 99% cpu 2.942 total

My transpiler with maximum optimizations and your jit code:

./bf2.lua -i mandel.b -o mandel.lua -O 2
# manually edited mandel.lua
time luajit mandel.lua
#  4.38s user 0.25s system 99% cpu 4.647 total

My transpiler with maximum optimizations and without % max:

./bf2.lua -i mandel.b -o mandel.lua -O 2
# manually edited mandel.lua
time luajit mandel.lua
# 2.07s user 0.01s system 99% cpu 2.082 total

My transpiler with maximum optimizations, your jit code and without % max:

./bf2.lua -i mandel.b -o mandel.lua -O 2
# manually edited mandel.lua
time luajit mandel.lua
# 1.51s user 0.01s system 99% cpu 1.515 total

I am mostly happy with the performance, but might add an option to disable the % max, for programs that don't rely on wrapping cells, edit: done in ad7a0ccdbf25281571175b44585dad099d20279b

The problems with your optimizations (and reasons why i won't adapt them) are a lack of support for PUC Lua and programs that rely on wrapping 8-bit cells:

lua mandel_fast.lua
lua: mandel_fast.lua:196: attempt to perform arithmetic on a table value (local 'data')
stack traceback:
    mandel_fast.lua:196: in main chunk
    [C]: in ?

For example this "Hello world" program, which doesn't work with your transpiler:

-[>-<+++++++]>-.[<----->+]<---.+++++++..+++.+[>--<-]>.---[<--->+]<.+[>--<---]>-.+++.------.--------.-[<->+++]<.
rdebath commented 1 year ago

Just a note: On luajit these three lines seem to compile to exactly zero jit instructions. So if you can ensure that the %256 is inside the if it has no impact on performance. In addition if the type of the tape cells is a byte (char or uint8_t) the %256 is already automatically done.

    local nj
    nj = (type(jit) ~= 'table')
    if nj then m[p+1] = m[p+1]%256 end
ExtReMLapin commented 1 year ago

Yes this is why in the VM settings i set it to char

dokutan commented 1 year ago

@rdebath Thanks, it is good to know that luajit can optimize away that if statement.

I thought about adding an option to use a luajit array as memory, but have decided against it for now, for a few reasons:

However using the -maximum 0 option no % max will be generated, and replacing the initial boilerplate of the output is relatively easy. If someone else cares about this, pull requests are always welcome.