Fallstop / bf-in-rosetta

Brainfuck interpreter in every launguge. Beacuse yes. In devlopment.
MIT License
2 stars 1 forks source link

Disappointing performance from rust interpreter. #4

Open rdebath opened 3 years ago

rdebath commented 3 years ago

The below BF code is generally pretty optimisable, but the example runs in your interpreter in 12seconds.

Others are: a first attempt interpreter at 20s, simple loop linking at 10s, a more complex no-lookahead interpreter at 5s, a primitive JIT at 2s. The more heavily optimising interpreters (Even without JIT!) are subsecond.

>+>++>+>+<[>[-<+++++>]<<]>[->+>+<<]>[->[->+>+>
+<<<]>>>[-<<<+>>>]<<[->[->+>+>+<<<]>>>[-<<<+>>
>]<<[->[-]<[->+>+<<]>>[-<<+>>]<<]<<]>[-]<<<]>[
-]<++++++++[<++++++++++>-]<-.----.<++++++++++.
$ target/release/rust-bf -oa -c EasyOpt.b
Compressing bf code...
Pre-matching braces...
Running bf code
Output: OK

BF execution done
Time taken: 12.10692s
$ time trixy -microbf EasyOpt.b
OK

real    0m19.639s
user    0m19.632s
sys     0m0.004s
$ time trixy -minibf EasyOpt.b
OK

real    0m9.661s
user    0m9.656s
sys     0m0.000s
$ time deadbeef EasyOpt.b
OK

real    0m5.545s
user    0m5.540s
sys     0m0.000s
$ time gcc-jit EasyOpt.b
OK

real    0m2.128s
user    0m2.124s
sys     0m0.000s
$ time bfi -b128 -r EasyOpt.b
OK

real    0m0.063s
user    0m0.060s
sys     0m0.000s
$ profilebf EasyOpt.b
OK
Program size 184
Final tape contents:
 : 10 75  0  0  0  0  0  0  0
    ^
Tape pointer maximum 8, cell value range 0..161
Skipped loops (zero on '['): 51842
Counts:     +: 1018384868   -: 1018384783   >: 1373269843   <: 1373269843
Counts:     [: 12598258     ]: 1018384781   .: 3            ,: 0
Total:         5814292379   run: 0.453s
$ 
Fallstop commented 3 years ago

Thanks for leaving an issue, and you are completely correct. It takes 3 seconds on my computer with the same settings, but that is still unacceptably long. I will look into the cause as some of the optimisations are going absolutely ballistic.

rdebath commented 3 years ago

Hmm, "optimisations are going absolutely ballistic" I haven't tried it on yours, but this might be interesting for that too ...

https://github.com/ShockSoc/2016-brainfuck-submissions/blob/master/extra_hard_ascii1.bf

Fallstop commented 3 years ago

I would call 20GB of logs ballistic at first sight. But I have now properly read Brainfuck code, the reason for the terrible performance is because of the nested loops due to a limitation in one of the optimisations. 1 loop: Instant, usually can do it 13-20ms start to finish and the number of times to loop doesn't matter 2 loops: Super fast, beats all interpreters that I tested. Not instant because it cannot skip loops with loops inside of them 3 loops: On par with most interpreters 4 loops (The program you gave): Bad, faster than dumb compilers but falls behind because of overhead from 128 bit numbers, loop cache and error checking.

I am working on a templating engine that will sub in common operations with single tokens, providing effectively no performance boost, but I might be able to include them in the loop cache optimisation. That would remove one nested loop greatly improving performance.