HaxeFoundation / haxe

Haxe - The Cross-Platform Toolkit
https://haxe.org
6.09k stars 647 forks source link

Sanitization for pattern matching #1929

Closed ncannasse closed 11 years ago

ncannasse commented 11 years ago

Optimizer.sanitize_expr is responsible for well-forming the AST, ie adding missing { } or () so that the AST can be nicely printed without having to tare care of that.

example: someone might construct by macro a EIF cond without cond having parenthesizes. reduce_expr will first remove all parenthesizes then sanitize will add back the necessary ones only.

TMatch has been commented out, I guess we need to add back support for TPatMatch? Please note that you don't need to handle recursion since it's a bottom-up algorithm.

stroncium commented 11 years ago

I'm currently working on speed-optimized js target for haxe(almost no common code with existing js target, that's why new), and I found it very strange that compiler works this way and was also forced to add some check specifically to get rid of redundant brackets I got during code transformations. So I'm rising the question, shouldn't we handle it in the simplest form(without sanitize) on common level and let target generators decide when to add additional brackets? After all, it depends on target if this brackets are needed, and adding brackets is much simpler than checking for their existence everywhere.

Simn commented 11 years ago

We indeed should do that once we actually use it. At the moment pf_pattern_matching is false for all platforms. I implemented it on neko as a proof of concept, but the mess I made there is actually slower than the conversion to TSwitch.

ncannasse commented 11 years ago

@stroncium: could you give an example of sanitization going wrong? The idea of sanitization is to have a common "clean" AST that handles most of the cases. It does not prevent the target generator to remove some brackets when necessary according to its syntax

stroncium commented 11 years ago

@ncannasse Well, due to the goal of my target(which is to produce fastest possible js code), I have to unroll a lot of expressions and introduce temporal variables in various places. Inline blocks, switches, try-catch with value all need this, but due to the nature of process I use for optimization, originating context for some expressions may not be known to the code building the actual resulting AST(I rebuild AST first, to get out all the quirks, and then use plain generator).

Now that I know about sanitization feature and the concept that it happens everywhere(I thought it is some compiler quirk), it will be much easier for me to control this brackets, but my original point was that on intuitive level I wasn't expecting this brackets to be in some places, and since haxe itself doesn't need this brackets to be there, it feels much more sensible to me to leave all this to the targets implementations or at least make it optional. Of course, it doesn't have a big performance impact and, after all, it is not so much of a trouble to handle all of this cases, but if we speak about my existing code(not sure about other targets) it would be shorter and clearer with this feature turned off.

Simn commented 11 years ago

Instead of trying to write an entirely new JS generator, we should focus on optimizing the current one. With the recent improvements of complex right-side expressions, output looks a lot nicer in many cases already. Could you report which other optimizations you had in mind?

stroncium commented 11 years ago
  1. I made a trial pull request for ocamllibs, and 20 days after no one even commented on it, so I figured community patches won't be accepted. And we do a lot of nodejs code for highload so I wanted to get results on our current code asap and get rid of this part of equation in our performance. That's why I decided to work it out alone.
  2. Maybe I'm not that good of a programmer but I wasn't able to incorporate complex code transformations into current js generator without making it a huge mess.
  3. The resulting code my generator produces differs quite a lot from original haxe code, and current js generator loves to produce 1-to-1 code, which makes some sense for learning and maybe debugging for some people, so it probably should be kept separate in any case.

The scope of my optimizations is quite huge and is expanding all the time. I used to do incremental improvements, but yet again codebase became hardly refactorable, so currently I mostly read v8 sources and experiment with different approaches to work out ways to generate most performant code, make some formal specification and then rewrite it again straight to the goal. In my current generator I got rid of switches, inlining blocks as function(){...}() (which was the reason I started it all, for it was slowing the code up to 100x times for some cases when using inlines, and we use inlines quite a lot), made a decision system for level of inlining, but I have some problems with operation precedence(inlining side effects). What is planned is removing(probably with decision system) ternary conditional operators(together with current value switch implementation as ternary conditionals chains), using single instance bind functions instead of JIT creation, removing the name of enum state from enum instances, but I still need to experiment more, as straightforward approach to some of these is not the fastest.