eclipse / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
940 stars 394 forks source link

Whither shrink wrapping? #2107

Closed 0xdaryl closed 5 years ago

0xdaryl commented 6 years ago

PR #2066 led to a discussion on the fate of the shrink wrapping optimization in the Eclipse OMR compiler. This is an optimization intended to defer register preservation required by a method's linkage as late as possible in the method (rather than always doing it in the prologue). The observation is that if preservation is occurring because of register usage on colder paths of the method then deferring the preservations until those paths (if possible) would be more beneficial to performance.

Shrink wrapping was written for Java and originally motivated by the Power architecture but has seen little (if any) performance benefit on any architecture to date (X, P, and Z have implementations). And in fact there are known functional issues on P (and possibly Z) with the code as-is.

We decided to contribute the code to OMR not as a working optimization but as an example for how one could do dataflow over a stream of TR::Instruction's. There are some analyses and modifications to the instruction class that need to occur to make this possible, and this optimization provided a working example of how to do that reasonably effectively. It is certainly possible that parts of the shrink wrapping optimization are specialized to Java semantics and this will need to be discovered and fixed for it to become a general optimization (for example, it works with J9 JIT private linkage and will have to be adapted for various system ABIs).

The code will certainly be cleaner without it.

The argument was raised in #2066 that this optimization may be more beneficial to non-Java runtimes where the inliner is not as aggressive as it is in Java. I don't think we'll know that until we try, and to do so means it should be dusted off and generalized.

Is it worth leaving this vestigal code in OMR where it is simply built but not tested, or should it be cleaned up? Its already part of git history so it won't really be going anywhere if it's removed. If it's removed, we can ensure it's removed under a single PR so that all changes needed to resurrect are clear.

Any opinions?

fjeremic commented 6 years ago

Is it worth leaving this vestigal code in OMR where it is simply built but not tested, or should it be cleaned up? Its already part of git history so it won't really be going anywhere if it's removed. If it's removed, we can ensure it's removed under a single PR so that all changes needed to resurrect are clear.

From an OpenJ9 committer's point of view this sounds good to me. In a dynamic runtime instruction level dataflow analysis on warm methods is relatively expensive for what we get in return (which is currently nothing). Either we need to make the dataflow faster or improve the implementation of shrink wrapping. On Z at least the store and restore of preserved registers is a single store/load multiple instruction which has n/4 + 1 cycle count where n is the number of registers being stored. The benefit in theory would only be realized if we can eliminate on average 2 registers from being preserved.

mstoodle commented 6 years ago

Maybe we should ask JitBuilder clients if they think shrink wrapping the preserved register saves/restores is worthwhile in their scenarios. I have seen it being a problem, but it never reached the "top of the pile" in importance for me in the JITs I've personally worked on.

Any feedback on this particular issue, @jduimovich @charliegracie @Leonardo2718 @ymanton ?

Leonardo2718 commented 6 years ago

I have no evidence regarding what the performance benefits of shrink wrapping would be in Lua Vermelha. I am also not familiar enough with the characteristics of shrink wrapping to make an educated guess. However, I suspect that the typically large number of branches in the generated code would make the data flow analysis very expensive to run, even on relatively small Lua functions.

Having never looked at the shrink wrapping code myself, I have no opinion about whether it should be cleaned up or reimplemented from scratch. However, I think shrink wrapping in Lua Vermelha is worth future investigation. So, I would like to see it brought back to life eventually, if only for the purpose of doing performance measurements to evaluate its usefulness. It would also be interesting to see how this optimization performs on ARM.

0xdaryl commented 5 years ago

This discussion has petered out, but I think it is important to reach a conclusion on this as new development often entangles with this code and can affect the design of new features (in a negative way).

My personal opinion (that I'll now turn into a proposal) is that the shrink wrapping code in the optimizer, code generators, and metadata be scrubbed from the code base. The code is there in the repo history if someone ever does feel the need to resurrect in some way or use it as an implementation example.

@vijaysun-omr @andrewcraik @gita-omr @ymanton @fjeremic @mstoodle : any final opinions on pardoning this code in OMR? I'd like to reach a conclusion by Wed Sept 19.

fjeremic commented 5 years ago

I'm also of the opinion we should purge the code. It's only consumer is OpenJ9 which has had the feature disabled as far in the history as I can go, and even looking at the history in the closed source repository before OpenJ9 open sourced I can see it has been disabled for at least half a decade due to functional problems.

With that in mind the state and benefit of the optimization is very questionable as is its presence in the codebase.

[1] https://github.com/eclipse/omr/blob/4c5bc8f56e8771fc26314c6e1a45c3caf384b08e/compiler/control/OMROptions.cpp#L2443-L2452

vijaysun-omr commented 5 years ago

I agree that we should delete shrink wrapping from the codebase if it's causing non trivial complications in other designs. Keeping it in the codebase always had it's pluses and minuses and it seems like the minuses are more significant now and so it is not worth the cost of maintaining the code.

I'll document my thoughts here so that it's recorded as part of the process by which this decision is being reached:

  1. Perceived advantage: The opt may be useful in some other language in the future even though Java no longer needs it. The code probably won't be easy to resurrect if there is significant refactoring of the codegen in the future. Reality: No clear evidence has presented itself that it would in fact help any other language and so this possibility alone may not be enough to keep this extra complexity around in the codebase.

  2. Perceived advantage: The opt was one of the first (and maybe till date, only ?) place in the codegen where we used dataflow analysis and could have served as an example in case there was a need for doing another dataflow analysis. Reality: No such need has manifested itself in the last several years and maybe this is an indication that the chances of this being important are low going forward too. Plus the code can always be found in the history if someone really needed to borrow ideas from it in the future (modulo any refactoring that changes the picture significantly).

fjeremic commented 5 years ago
  1. Perceived advantage: The opt was one of the first (and maybe till date, only ?) place in the codegen where we used dataflow analysis and could have served as an example in case there was a need for doing another dataflow analysis.

I think if we are careful to keep the deprecation within a minimal set of PRs (say two or three) it would probably be even a better example of how it is done than to have someone without much experience in the codebase trying to find all the bits and pieces of where optimization touches.

That is to say the inverse of the set of commits which will remove ShrinkWrapping from the codebase is a clear example of how one can add a codegen dataflow optimization.

0xdaryl commented 5 years ago

Agreed. Removing it from downstream projects (e.g., OpenJ9) might have to happen first.

andrewcraik commented 5 years ago

I'm fine with the code being removed based on the discussion above. The only thing I would suggest is that if there are any APIs on the data flow engine etc needed for the use in the codegens we may not want to purge those since they aren't part of the current complexity. We would need to mark/document these APIs (if any) to ensure they aren't purged as unused later if we want to keep the avenue of adding dataflow to codegen possible for other languages in the future.

mstoodle commented 5 years ago

+1 from me, with a minor modification to Andrew's point: we should review those APIs that have been added to support the code generators to make sure that API still makes sense in the grander scheme of things. If they aren't the perceived right way to do it, then we should get rid of them and do it properly the next time around without being saddled by the constrains of an earlier ill conceived API.

gita-omr commented 5 years ago

I need to investigate a bit more. Prologues on Power are indeed very expensive. I would like to understand better why this optimization did not work.

0xdaryl commented 5 years ago

The last performance study on shrink wrapping on Java was conducted maybe 5 years ago and saw only marginal gains in performance (certainly on x86, but I believe that was the case on the other architectures as well) running enterprise workloads (lots of methods, flat profile). One of the strongly suspected reasons behind this was because the aggressiveness and effectiveness of the Java inlining optimization limited the number of shrink wrapping opportunities (in part by reducing the number of prologues executed). And, improving the inlining strategy realizes many more benefits and presents far more opportunities for optimization than does shrink wrapping alone.

What kind of investigation are you thinking of: an actual performance run with investigation or a thought experiment? I'm doubtful that if this feature was switched on that it would still work out of the box.

These days, when more performance is needed somewhere, inlining usually plays a big part in the answer. I suspect you'll get far more out of investigating better inlining strategies (which works against shrink wrapping) than resurrecting and tuning shrink wrapping.

I'd still like to keep a Sept. 19 deadline to decide on this because I don't want development progress in OMR to be held up by a long-dormant feature.

gita-omr commented 5 years ago

We spent a lot of time on improving inlining strategies and it proved to be very non-trivial. Essentially, it involves inter-procedural analysis. Inlining one method causes another not to be inlined, sometimes profiler does not/can't predict correct target etc. Finding the right balance is extremely difficult. On another hand, reducing the number of saved registers in the prologue is a per method optimization and therefore should be much more straightforward. I wanted to understand how it works and why it causes all these troubles for OMR and needs to be removed. Unfortunately, I've been busy last couple weeks and did not follow the discussion.

ymanton commented 5 years ago

I recall part of the reason this was abandoned was because relative to the performance it gave it was a very difficult opt to debug. When it went wrong it caused bugs to appear long after the method has been returned from and the evidence on the stack obliterated.

0xdaryl commented 5 years ago

I had a talk with @gita-omr and I'm willing to hold off on any decision for 3 weeks to give her time to analyze whether it will provide any substantial benefit to Power. I'd appreciate any insight on the potential impact of shrink wrapping to performance issues on Power.

andrewcraik commented 5 years ago

@mstoodle I'm fine with your modification to my approach. If we are going to purge APIs that were done the 'wrong way' I'd like those to be done in a separate commit from the general removal of shrink wrapping (should it go ahead) so that people trying to find the old API to know what not to do with the new API can find it without wading through shrink wrapping details.

0xdaryl commented 5 years ago

Have you had the opportunity to reach any conclusions on this @gita-omr?

gita-omr commented 5 years ago

Have you had the opportunity to reach any conclusions on this @gita-omr?

I looked at the code and it's quite non-trivial. I suspect it comes from the fact that it's trying to deal with the control flow not expressed in CFG (e.g. snippets). I can also imagine that it may introduce a lot of bugs that are hard to debug.

On the other hand, I don't know of any better solution and can't see yet how we would do it better in case such optimization is needed again. I think it would be a pity to discard all this experience. At the very least we should remove it in such a way that it's very easy to recover.

fjeremic commented 5 years ago

@0xdaryl I've contained the entire removal into 4 pieces so as to avoid build breaks. The PRs linked to this Issue describe in which order they are to be merged. None of the PRs have dependencies and the can be merged one after another without breaking builds. I have marked ones which must wait with WIP status to prevent accidentally merging them.

If both OMR and both OpenJ9 PRs are squashed together and the changes reversed it can serve as a reference if someone in the future wants to see how something like ShrinkWrapping would be implemented on the JIT side.

fjeremic commented 5 years ago

@0xdaryl I beleive we have the last two PRs. I can't find any reference to Shrink Wrapping in the codebase following these changes. We can close this issue once the above two PRs are merged.

0xdaryl commented 5 years ago

I believe the vast majority of shrink wrapping was removed across a small number of commits in two repos in the following order:

  1. eclipse/openj9#3166
  2. 3036

  3. 3037

  4. eclipse/openj9#3167
  5. 3068

  6. eclipse/openj9#3291

Those wishing to experiment with this feature in the future will need to apply the PRs in reverse order.

Thanks @fjeremic for contributing the work. Closing this now.