HaxeFoundation / haxe

Haxe - The Cross-Platform Toolkit
https://haxe.org
6.18k stars 656 forks source link

3.2 nightly, cpp gen time is crazy? #3819

Closed ruby0x1 closed 8 years ago

ruby0x1 commented 9 years ago

It possibly affects other platforms as well, but I have seen this on Mac.

I saw this happen around 4 months ago, but didn't require 3.2 and figured it was general development hiccups that would pass.

Now, I need to use 3.2 daily, and it's near impossible :)

The version I am using is haxe_2015-01-27_development_456a1a8.tar.gz No matter if I use zip/installer, result is the same.

I had tried latest on the 28th, it wasn't working for other reasons.

Below is the output, run directly from cli , using haxe bin/mac64.build/build.hxml --times -v

... loads of output, quick
Typing EReg.map
...huge delay
Generating Cpp in : bin/mac64.build/cpp
Total time : 34.310s
------------------------------------
 filters : 31.630s, 92%
 macro execution : 0.310s, 1%
 other : 1.800s, 5%
 parsing : 0.220s, 1%
 typing : 0.350s, 1%

real    0m34.550s
user    0m34.063s
sys    0m0.301s

A bonus screenshot, the profiler showing where it's spending it's time if that helps at all.

screenshot 2015-01-29 23 53 19

Simn commented 9 years ago

Could you try compiling with -D no_simplify?

ruby0x1 commented 9 years ago
haxe bin/mac64.build/build.hxml -D no_simplify --times
Total time : 2.230s
------------------------------------
  filters : 0.400s, 18%
  macro execution : 0.270s, 12%
  other : 1.050s, 47%
  parsing : 0.240s, 11%
  typing : 0.270s, 12%

real    0m2.278s
user    0m2.165s
sys 0m0.102s

Seems related!

Simn commented 9 years ago

Ouch! Thanks for checking. I knew that the SSA transformer was a bit slow, but I've never had any such issues with the simplifier. I'll see what I can find.

ruby0x1 commented 9 years ago

Great thanks, this I can work at least without reverting each time.

I pinged you on gitter, you can also find me on IRC if you need me to try stuff, set up a dev env or anything like that.

Simn commented 9 years ago

This was mostly addressed by https://github.com/HaxeFoundation/haxe/commit/933778e13742d6ea327e31bb6340fb712f0092ab

New stats:

Total time : 4.280s
 ------------------------------------
   analyzer-simplify-apply : 0.190s, 4%
   filters : 2.050s, 48%
   macro execution : 0.330s, 8%
   other : 1.170s, 27%
   parsing : 0.250s, 6%
   typing : 0.290s, 7%

I would like to keep this issue open because gen_local has problems. Take a look at this:

class Main {
    static function main() {
        makeSomeNoise();
    }

    macro static function makeSomeNoise() {
        var el = [];
        for (i in 0...10000) {
            el.push(macro [1]);
        }
        return macro $b{el};
    }
}
Total time : 0.325s
------------------------------------
  filters : 0.108s, 33%
  macro execution : 0.109s, 34%
  neko generation : 0.048s, 15%
  other : 0.000s, 0%
  parsing : 0.029s, 9%
  typing : 0.031s, 9%

Now we change the generated expression to a Map comprehension which internally creates a temp variable:

class Main {
    static function main() {
        makeSomeNoise();
    }

    macro static function makeSomeNoise() {
        var el = [];
        for (i in 0...10000) {
            el.push(macro [1 => 2]);
        }
        return macro $b{el};
    }
}
Total time : 25.143s
------------------------------------
  filters : 0.934s, 4%
  macro execution : 0.158s, 1%
  neko generation : 0.202s, 1%
  other : 0.000s, 0%
  parsing : 0.029s, 0%
  typing : 23.819s, 95%

If I replace the internal gen_local with alloc_var "tmp" (and let rename_local_vars deal with the rest):

Total time : 1.429s
------------------------------------
  filters : 0.895s, 63%
  macro execution : 0.175s, 12%
  neko generation : 0.201s, 14%
  other : 0.000s, 0%
  parsing : 0.030s, 2%
  typing : 0.127s, 9%

I know we usually don't have 10000 temp vars, but that's two orders of magnitude!

Simn commented 9 years ago

@ncannasse: Is there any reason we actually need gen_local or could we just alloc every generated temp var as "tmp" and let rename_local_vars rename it? That should also give a small boost to completion performance because we don't do any string acrobatics until the post-processing stage.

ncannasse commented 9 years ago

We could try it, I think gen_local is quite old and was most likely introduced before rename_local_vars.

Simn commented 9 years ago

My test was invalid (I'm guessing I had rename_local_vars commented out). The actual times are this:

Total time : 25.703s
------------------------------------
  command : 0.162s, 1%
  filters : 25.147s, 98%
  generate js : 0.089s, 0%
  macro execution : 0.157s, 1%
  other : 0.001s, 0%
  parsing : 0.026s, 0%
  typing : 0.121s, 0%

It's still preferable to have this because at least completion can work, but rename_local_vars obviously has the same problems as gen_local.

Simn commented 9 years ago

Bah, there's actually code which relies on the name being unique during typing, so I don't see an easy fix for this right now.

ncannasse commented 9 years ago

@Simn maybe we should replace this code by using var ids ? That should give up a bit of speedup as well

Simn commented 9 years ago

I thought about that too, but then I remembered you wanting the temp vars to be nicely enumerated. Or was that just about them being enumerated consistently for the compilation server?

It's still not entirely safe because in theory there could be a user variable named tmp1234.

ncannasse commented 9 years ago

I don't remember why I would care that much about local variable names, as long as the output is stable (don't vary between compilations), and if we can keep original user var naming as much as possible (for code output readability)

Simn commented 9 years ago

Using the var ids works nicely. There are some problems with the optimization tests though which compare the output with assumptions on the temp var naming.

ruby0x1 commented 9 years ago

Now that 3.2.0 is released (woo!) I was curious if there was any further investigation here?

For reference, against haxe 3.1.3 :

time haxe bin/mac64.build/build.hxml --times
Total time : 2.810s
------------------------------------
  filters : 0.660s, 23%
  macro execution : 0.280s, 10%
  other : 1.270s, 45%
  parsing : 0.260s, 9%
  typing : 0.340s, 12%

real    0m2.873s
user    0m2.732s
sys 0m0.140s

And 3.2.0

sven$ time haxe bin/mac64.build/build.hxml --times
Total time : 5.540s
------------------------------------
  analyzer-simplify-apply : 0.160s, 3%
  filters : 2.480s, 45%
  macro execution : 0.330s, 6%
  other : 1.940s, 35%
  parsing : 0.290s, 5%
  typing : 0.340s, 6%

real    0m5.601s
user    0m5.431s
sys 0m0.171s

That's a lot slower than before and this concerns me because an argument for haxe being so fast to iterate with has slipped - I no longer can praise this when developing against the c++ target. With the js target it used to be 2:1 difference, and still be super quick in wall clock time, but now it's at 5:1 and it's wall clock time staring at the CLI, where previously i'd blink and it would be continuing to the actual slow part (the c++ compiler).

Here is 3.2.0 with -D no-simplify, which is on par with 3.1.3 and back to respectably quick iteration times. So my real question I guess would be what are the negative consequences of permanently enabling the no-simplify flag. I know the team does excellent work on the input and output of the compiler - so I hesitate to assume it's of no consequence. I don't want to disable things that are on by default but the cost of build times taking this much longer are far from negligible.

Is there anything I can help test with? Anything to investigate on the user side?

sven$ time haxe bin/mac64.build/build.hxml --times -D no-simplify
Total time : 2.720s
------------------------------------
  filters : 0.500s, 18%
  macro execution : 0.320s, 12%
  other : 1.220s, 45%
  parsing : 0.240s, 9%
  typing : 0.440s, 16%

real    0m2.770s
user    0m2.633s
sys 0m0.135s

For comparison, with the js target:

3.1.3

time haxe bin/web.build/build.hxml --times
Total time : 1.040s
------------------------------------
  filters : 0.250s, 24%
  generate js : 0.070s, 7%
  macro execution : 0.270s, 26%
  other : 0.000s, 0%
  parsing : 0.240s, 23%
  typing : 0.210s, 20%

3.2.0

time haxe bin/web.build/build.hxml --times
Total time : 1.150s
------------------------------------
  filters : 0.260s, 23%
  generate js : 0.070s, 6%
  macro execution : 0.320s, 28%
  other : 0.020s, 2%
  parsing : 0.230s, 20%
  typing : 0.250s, 22%

real    0m1.166s
user    0m1.093s
sys 0m0.073s
Simn commented 9 years ago

The whole simplifier/analysis part is in its first iteration so there are still a few implementation problems with it that can be optimized. The reason simplification is enabled by default is to ensure correct evaluation order on the C++ target. Consider this example:

class Main {
    static function main() {
        var x = 0;
        call(x++, x);
    }

    static function call(a:Int, b:Int) {
        Sys.println(a + " " + b);
    }
}

With the simplifier this translates to:

int x = (int)0;
int tmp = (x)++;
int tmp1 = x;
::Main_obj::call(tmp,tmp1);

Whereas without it you get:

int x = (int)0;
::Main_obj::call((x)++,x);

This version has undefined evaluation order in C++. You will likely get 0 0 as output as opposed to the expected 0 1.

In 3.1.3 there was a simplified version of the simplifier which was much faster but missed some cases. My strategy here is to first get something that works, cover all cases in the tests and then start optimizing the implementation. We also lose a lot of time simply because the AST is much larger than it used to be which can be alleviated by reconstructing parts of the AST that are harmless.

ncannasse commented 9 years ago

@Simn could you investigate ways to speedup this part ? even with a big AST, that should not take most of the time spent in Haxe compiler (if we exclude other which is c++ compiler, the simplify seems to take 55% of the compilation time by itself)

Simn commented 9 years ago

Where does the 55% figure come from? I'm looking at analyzer-simplify-apply : 0.160s, 3%. Our unit tests have analyzer-simplify-apply : 0.072s, 0% with other : 52.346s, 93%.

Simn commented 9 years ago
haxe compile-cpp.hxml --times --no-output
Total time : 2.815s
------------------------------------
  analyzer-simplify-apply : 0.071s, 3%
  filters : 1.329s, 47%

And with rename_local_vars commented out:

haxe compile-cpp.hxml --times --no-output
Total time : 1.778s
------------------------------------
  analyzer-simplify-apply : 0.056s, 3%
  filters : 0.321s, 18%

So this is again the same problem which I originally identified and there is no reason to panic.

ncannasse commented 9 years ago

(time spent in filters 2.48 - time spent in filters with no-simplify 0.5) = 1.98 divided by (total compilation time 5.54 - time spent in other 1.94) = 3.6 = 0.55

Maybe that's not analyzer-simplify-apply, but some other filters that gets applied when there's no -D no-simplify and that takes this time. Don't worry I'm not panicking in any way, just want it to get investigated so we can improve our compilation speed :)

Simn commented 9 years ago

Yes but I almost panicked because I thought this was some new problem that suddenly came up. :)

I will look into optimizing rename_local_vars. My thinking is that we should just use the variable IDs if we use the simplifier without unapplying it. In that case the output is gonna "look like a mess" anyway so we might as well use numeric variable names.

ruby0x1 commented 9 years ago

The reason simplification is enabled by default is to ensure correct evaluation order on the C++ target.

Does this mean js doesn't use it (yet)?

Also ; I am curious about the "other", considering that js is 0.000/0.020 and c++ 1.270/1.940 for 3.1.3/3.2.0 respectively, this is also potentially a place for a general improvement (aside from the simplification costs) - perhaps this is also a place for shaving a few hundred ms on the c++ side.

I don't doubt the vast differences in target, but i am curious what makes the difference that wide, from 0 to 1.27s. Is there a way to make the timing as verbose as possible?

Simn commented 9 years ago

It looks like C++ has "other" because it has no "generate cpp" timer. I'll unify the timer behavior so the individual generators don't have to deal with it.

And no JS doesn't use the simplifier because it defines the same left-to-right evaluation order Haxe does.

ncannasse commented 9 years ago

@underscorediscovery yes, other is most likely time spent by the C++ compiler/linker, which JS does not have

ruby0x1 commented 9 years ago

@ncannasse the c++ compiler & linker is not included in c++ generation time above.

I was specifically referring to "other" in the haxe --times list.

ncannasse commented 9 years ago

? yes I'm also talking about that "other". Why are you saying it's not included ? Usually haxe compiler will call hxcpp build tool which will run c++ compiler. This should be included in "other" unless hxcpp has a timer specific to it, or you disabled compilation in you build.hxml, which we can't know because we don't know what's inside

ruby0x1 commented 9 years ago

I clarified my reply, I was trying to be explicit/clear. There is no c++ compilation/linking happening.

(I was in the mindset that this post is about cpp gen, I would not include the c++ compiler time in there. Also: I would love if c++ compiling/linking took < 2s! But no - I wouldn't hold haxe to the c++ compiler times ofc, apologies for confusion of the wording)

Simn commented 9 years ago

I thought this was just because the output files had no changes which would only invoke the linker. In that case I'm not sure what "other" actually is, but as I said I'll go ahead and work on the timers.

ruby0x1 commented 9 years ago

Ah, ok, well here is some more reference of the code I was testing against.

sven$ haxe bin/mac64.build/build.hxml --times --no-output -D no-compilation
Total time : 3.830s
------------------------------------
  analyzer-simplify-apply : 0.210s, 5%
  filters : 2.530s, 66%
  macro execution : 0.400s, 10%
  other : 0.000s, 0%
  parsing : 0.300s, 8%
  typing : 0.390s, 10%

sven$ haxe bin/mac64.build/build.hxml --times -D no-compilation
Total time : 5.840s
------------------------------------
  analyzer-simplify-apply : 0.160s, 3%
  filters : 2.600s, 45%
  macro execution : 0.350s, 6%
  other : 2.070s, 35%
  parsing : 0.250s, 4%
  typing : 0.410s, 7%

This is haxe 3.2.0 and the "other" seems to imply it's the c++ output itself.

Simn commented 9 years ago

I have updated the timers. From our unit tests I'm now getting:

  analyzer-simplify-apply : 0.053s, 1%
  filters : 1.332s, 13%
  generate cpp : 1.579s, 16%
  generate cpp - native compilation : 5.577s, 56%
  other : 0.020s, 0%
  parsing : 0.259s, 3%
  typing : 0.513s, 5%

Let's wait for a few days until we can merge the haxe-3.3 branch into development so you can check for yourself with the nightly builds.

ruby0x1 commented 9 years ago

Cool, I'll keep an eye out thanks. Also note the times above - the "other" is 0.000 when there is --no-output. That's 2s+, if that is somehow just unoptimized, it might be a good place to save time short term till the simplifier is more progressed.

Simn commented 9 years ago

That "other" is likely the part of gencpp that writes out the .cpp/.h files, i.e. "generate cpp" in the new timers. There might be room for improvement but I'm not sure if that's worth it given that the total compilation time is going to be dominated by the native compilation anyway.

ruby0x1 commented 9 years ago

This entire thread is about the time here being "worth it". It contributes 2/5 seconds, and 2/2.7 seconds in 3.1.3, this is definitely something to consider worth optimizing, if possible, as it's a large chunk of the entire problem (long cpp gen times).

Simn commented 9 years ago

That's not what your output tells. Between your 3.2.0 and 3.1.3 times there's a 0.7s for "other", whereas there's a 2.2s difference for "filters". The latter is what Nicolas described as "in Haxe compiler" and we should definitely address that. A portion of the 0.7s come from the simple fact that the output is larger and whatever difference remains is insignificant when put next to the native compilation times.

ruby0x1 commented 9 years ago

I was referring to the whole, my current problem/concern (as in 3.2.0 current) is that 5+ seconds is too long. If just less than half of that is the output - I was just positing it may be low hanging fruit - as you said you wanted to get the simplifier working right and I agree with the approach.

ncannasse commented 9 years ago

I'm sure this gencpp time can be improved, and this will offer a significant speedup in the case only one file has been modified since in that case we only compile this specific file + link

Simn commented 9 years ago

I don't doubt that there can be improvements, but significant speedup?

 generate cpp : 1.496s, 14%
 generate cpp - native compilation : 6.482s, 59%

That's for a no-change compilation. For any real compilation involving a few files the result is gonna be skewed towards the native compilation even more.

Granted, VC is not exactly known for its amazing compilation speed.

ruby0x1 commented 9 years ago

Just to re-note, I am not considering the cpp compilation times, ever, above. I never run cpp builds through haxe itself, my long wait is well before that happens - I don't run hxcpp via the haxe executable.

Simn commented 9 years ago

I see, so I suppose my problem is that I view both generation steps as belonging together.

Still, I don't think there's much to gain in the gencpp part itself.

mcheshkov commented 9 years ago

I'll add my 5 cents. Haxe 3.1.3 and 3.2.0, openfl 2.2.4, hxcpp 3.1.68, gcc 4.8.4.

$ haxelib run openfl create PiratePig
$ cd PiratePig/
$ time haxelib run openfl build linux
real    0m58.774s
user    3m20.226s
sys 0m9.687s

$ time haxelib run openfl build linux
real    0m0.883s
user    0m0.791s
sys 0m0.089s

$ # comment some irrelevant line in Source/piratepig/PiratePig.hx 
$ time haxelib run openfl build linux
real    0m1.676s
user    0m1.498s
sys 0m0.171s

First one compiles everything, second compiles none, but runs haxe, last one compiles one cpp file, link one binary and strip it.

$ haxe Export/linux64/cpp/haxe/release.hxml -D HXCPP_M64 --times
Total time : 0.690s
------------------------------------
  filters : 0.120s, 17%
  macro execution : 0.070s, 10%
  other : 0.250s, 36%
  parsing : 0.140s, 20%
  typing : 0.110s, 16%

$ HAXE_STD_PATH=~/haxe/haxe-3.2.0/std:. ~/haxe/haxe-3.2.0/haxe Export/linux64/cpp/haxe/release.hxml -D HXCPP_M64 --times
Total time : 1.100s
------------------------------------
  analyzer-simplify-apply : 0.010s, 1%
  filters : 0.350s, 32%
  macro execution : 0.090s, 8%
  other : 0.390s, 35%
  parsing : 0.130s, 12%
  typing : 0.130s, 12%

$ HAXE_STD_PATH=~/haxe/haxe-3.2.0/std:. ~/haxe/haxe-3.2.0/haxe Export/linux64/cpp/haxe/release.hxml -D HXCPP_M64 --times -D no_simplify
Total time : 0.830s
------------------------------------
  filters : 0.090s, 11%
  macro execution : 0.090s, 11%
  other : 0.350s, 42%
  parsing : 0.140s, 17%
  typing : 0.160s, 19%

hxml used

-main ApplicationMain 
-cp /home/cheshkov/haxelib/openfl/2,2,4
-D openfl=2.2.4
-cp /home/cheshkov/haxelib/hxcpp/3,1,68
-D hxcpp=3.1.68
-cp /home/cheshkov/haxelib/actuate/1,8,1
-D actuate=1.8.1
-cp Source
-cp /home/cheshkov/haxelib/lime/2,0,6/legacy
-D tools=2.0.6
-D lime-legacy
-D no-compilation
-D native
-D openfl-native
-D lime-native
-D linux
-D desktop
--remap flash:openfl
-cp Export/linux64/cpp/haxe
-cpp Export/linux64/cpp/obj/
--macro keep("piratepig.PiratePig")

So yes, short iteration with little changes with native compilation can be as short as two seconds, and up to half of that is haxe itself.

Simn commented 8 years ago

Could you guys check again with current development version?

Simn commented 8 years ago

I looked into the local name issue again and it occurred to me that we might have to be careful with generated local names after all in case a typed expression gets re-typed through macros. If we generated all locals with the same name it could lead to a wrong ctx.locals lookup in case two locals are generated in the same scope.

This is probably a rather rare case, but I would rather not take any chances. A better approach is to keep track of the number of locals per field and generate them as `_1,_2` etc.

Regarding rename_local_vars, I'm afraid using variable IDs would lead to terrible diffs which at this point is something I care about. It might be better to just keep count per field. This might lead to numbering holes in some cases, but I can't see anyone caring about that.

ruby0x1 commented 8 years ago

I'll give this a try soon as I can (holidays), thanks!

Simn commented 8 years ago

I haven't heard of any problems so I'll consider this closed. Feel free to open a new issue if you come across similar problems.

ruby0x1 commented 8 years ago

@Simn I tried the latest nightly just now, sorry for the delay I have been swamped! Testing against 3.3.0 (git build development @ a1d47d6) I also just tried 3.2.1 on the same test and it ~6.240s, way higher.

It definitely seems to be improved, good work. Now the bulk of the time is spent in generate cpp, and macros I have to optimize:

haxe bin/mac64.build/build.hxml --times -D no-compilation
Total time : 4.840s
------------------------------------
  analyzer-cleanup : 0.020s, 0%
  analyzer-filter-apply : 0.020s, 0%
  analyzer-from-texpr : 0.170s, 4%
  analyzer-fusion : 0.120s, 2%
  analyzer-to-texpr : 0.040s, 1%
  filters : 0.580s, 12%
  generate cpp : 1.470s, 30%
  macro execution : 1.130s, 23%
  other : 0.020s, 0%
  parsing : 0.640s, 13%
  typing : 0.630s, 13%

In a smaller code base the difference is bigger because the macros are the highest:

haxe bin/mac64.build/build.hxml --times -D no-compilation
Total time : 1.920s
------------------------------------
  analyzer-cleanup : 0.000s, 0%
  analyzer-filter-apply : 0.000s, 0%
  analyzer-from-texpr : 0.010s, 1%
  analyzer-fusion : 0.020s, 1%
  analyzer-to-texpr : 0.010s, 1%
  filters : 0.150s, 8%
  generate cpp : 0.120s, 6%
  macro execution : 0.820s, 43%
  other : 0.010s, 1%
  parsing : 0.280s, 15%
  typing : 0.500s, 26%
Simn commented 8 years ago

Cool, thanks for checking!