adobe-flash / crossbridge

Welcome to visit the homepage!
http://www.crossbridge.io
542 stars 194 forks source link

performance aspects of crossbridge code #58

Open ddyer0 opened 9 years ago

ddyer0 commented 9 years ago

I have an unexpected (to me) result, comparing native C code with the same code executed by crossbridge, to the same code manually translated to actionscript.

The relative speeds are

2.043 native windows code 6.707 crossbridge compiled native code 25.3 native actionscript code

Losing a factor of 3.7 going from native C to SWC is not too bad, but I'm totally shocked that the carefully written actionscript, using exactly the same algorithm, should be a factor of 10 slower than native C. The code in question is crunching numbers with doubles (in C) vs. with :Number in actionscript.

Native C: double m = min (nr, min (ng, nb)); double nm = nv-m; double ns = ns = nm / nv ; double r1 = (nv - nr) / nm ;
double g1 = (nv - ng) / nm ;
double b1 = (nv - nb) / nm ;
double nh;

if (nv == nr)
{
    if (m == ng)
        nh = 5.0 + b1 ;
    else
        nh = 1.0 - g1 ;
}

else if (nv == ng)
{
    if (m == nb)
        nh = 1.0 + r1 ;
    else
        nh = 3.0 - b1 ;
}

else if (nv == nb)
{
    if (m == nr)
        nh = 3.0 + g1 ;
    else
        nh = 5.0 - r1 ;
}

Actionscript:

var m:Number = (nr<ng) ? ((nr<nb) ? nr : nb) : ((ng<nb) ? ng : nb);
var mm:Number = (nv - m);
var ns:Number = mm / nv ;
var r1:Number = (nv - nr) / mm ;   
var g1:Number = (nv - ng) / mm ;
var b1:Number = (nv - nb) / mm ;
var nh:Number =
    (nv==nb)
        ? ((m == nr) ? (3.0 + g1) : (5.0 - r1)) 
        : ((nv == ng)
            ? ((m == nb) ? (1.0 + r1) : (3.0 - b1))
            : ((m == ng) ? (5.0 + b1) : (1.0 - g1)));
mbolt35 commented 9 years ago

This may seem strange, but try using if conditionals instead of the ternary ops in the AS3, and see if that gives you better results. I can't say that I've examined the .abc output generated by nested ternary operations, but in my experience, AS3 handles execution branching in an inconsistent way. I agree that that number is surprising, but I have also seen various "swings" in timing AS3 algorithms. There are a few variables at play including Flash Player version, Debugger vs Release player. In browser versus stand-alone, etc...

Also, are you running that code on repeat in a tight loop and taking averages? Are you running in a function?

I'd guess that code should execute at about double the time of crossbridge, and the big reason why that's the case is that the crossbridge compiles to code that performs these operations on the domain memory byte array, and the AS3 code is going to create managed objects. If you think about it in those terms, simple load/compare/store operations on a byte array are going to be a great deal faster.

I won't lie, I'm surprised that the crossbridge code runs that much slower than the native windows test, but a good way to generically group Crossbridge performance is to compare to .NET execution. Crossbridge should be as fast (if not slightly faster) in some calculations as compared to .NET.

ddyer0 commented 9 years ago

I'm running a real test with 2^24 x 2 distinct function calls in the loop. From other experiments, I've determined that a lot of the difference is function call overhead.

"Everything" in C is 2 seconds.

"Everything" cross-compiled is 6 seconds. Of that 6 seconds, about half is function call overhead for internal calls, presumably not using the same stack frames as normal AS3; the function arguments are doubles and pointers to double, which would not be possible in normal AS3. So there's approximately a 3x speed penalty for cross-compiled code (of this type).

The as3 test loop overhead, including the function calls, is about 13 seconds The actual number crunching is about 10 seconds in native as3; compared to about 3 seconds in cross compiled code, so there's a further 3x penalty for using proper as3 data and stack structures.

Bottom line, in ballpark numbers: Cross compiling has a 3x speed penalty, but still 3x faster than writing pure as3.

The part of this that is surprising to me is that the cross compiled code is so much better than AS3 - I would have expected it to be much closer in speed.

Also note that this is not necessarily representative of what can be achieved in pure byte-pushing code. My test is heavy on floating point and function calls.