dyu / ffi-overhead

comparing the c ffi (foreign function interface) overhead on various programming languages
Apache License 2.0
657 stars 40 forks source link

Unroll the loops? #13

Open munificent opened 6 years ago

munificent commented 6 years ago

Since you benchmark the entire program elapsed time, the loop itself — evaluating the exit condition, looking up the variable, and jumping back or out — adds to the execution time. The body of the loop is tiny, just a single FFI call, so the loop overhead is very likely significant, especially in the interpreted languages.

A simple fix is to unroll the loop several times. Maybe something like (here's the Wren one):

class FFI {
    foreign static plusone(x)
    foreign static count()
    foreign static start()
    foreign static stop()
}
var x = 0
var count = FFI.count()
// try call
FFI.plusone(x)

FFI.start()
while (x < count) {
    x = FFI.plusone(FFI.plusone(FFI.plusone(FFI.plusone(FFI.plusone(x)))))
}
FFI.stop()

My hunch is that you'll see quite different relative numbers if you do this for all the languages. Right now, your benchmarks probably show the relative performance of assigning variables and looping more than they do FFI.

dyu commented 6 years ago

Hey @munificent Good idea. That can be implemented as a flag from extra cli args. Maybe having 10 ffi calls per loop might be better? (easier to digest)

I like wren btw :-)

munificent commented 6 years ago

Maybe having 10 ffi calls per loop might be better? (easier to digest)

You mean like:

while (x < count) {
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
    x = FFI.plusone(x)
}

?

That works too, though you get a little extra overhead from the assignment and variable lookup. Normally, this kind of stuff doesn't matter much, but since FFIs are generally pretty fast, this kind of tiny code can actually dwarf it.

I like wren btw :-)

Thanks, me too! 😉

dyu commented 6 years ago

Well, more like your original suggestion (chaining) but with 10 calls instead of 5.

while (x < count) {
    x = FFI.plusone(
        FFI.plusone(
            FFI.plusone(
                FFI.plusone(
                    FFI.plusone(
                        FFI.plusone(
                            FFI.plusone(
                                FFI.plusone(
                                    FFI.plusone(
                                        FFI.plusone(x)
                                    )
                                )
                            )
                        )
                    )
                )
            )
        )
    )
}
munificent commented 6 years ago

That sounds good to me. :)