beeware / voc

A transpiler that converts Python code into Java bytecode
http://beeware.org/voc
BSD 3-Clause "New" or "Revised" License
869 stars 518 forks source link

[GSoC] Optimizing the VOC Compiler #772

Open patiences opened 6 years ago

patiences commented 6 years ago

This proposal intends to improve the performance of VOC-generated bytecode by adding piecemeal optimizations.

Table of Contents

  1. Motivation
  2. Goals
  3. Overview of Changes
  4. Considerations
  5. Proposed Timeline
  6. Risk Analysis

Motivation

The current implementation of Python AST to Java bytecode transpilation in many cases takes a naive approach, resulting in redundant bytecode instructions being produced and class files being longer than necessary. Not only does this make the code run slower than it should, this causes problems in some cases because the JVM enforces a size limit on class files, in particular on method sizes: each method must be less than 64KB.

This proposal explores optimizations that cut down the number of generated bytecode instructions.

Goals

This is a list of criteria to use as an evaluation of the success of this project.

Overview of Changes

Reducing Instances of Variable Creation

VOC-generated bytecode contains a lot of Object creation and initialization because everything in Python is an Object. Being frugal when creating string and integer objects, in particular, will result in huge savings.

Primitive Types

The Python Object abstraction is preserved in BeeWare's java implementation of the built-in types. However, this means that every time one of these objects is used instructions for Object creation must be generated. Working directly with the java primitives rather than wrapping them in Object would give a significant performance boost. The standard python types int, float, complex, Str, bool and None are considered here as the first candidates for refactoring.

Integers

CPython keeps an array of integer objects for all integers between -5 and 256. When an integer is created in that range a reference to the existing object is returned. It should be possible for VOC to do the same, that is, instead of generating bytecode that creates a new Integer object, any integer in [-5, 256] will refer to the preallocated integer.

Improving Variable Lookup

When a variable is read, instructions for loading global or local variables are generated. However, if the variable is looked up again, VOC has to generate the same loading instructions again. This results in a lot of duplicated bytecode. VOC should cache values after the first lookup of global variables, modules, and functions whenever possible.

Considerations

Proposed Timeline

Preparation

Integer Pre-allocation

Variable Lookup Optimization

Primitive Types Optimization

Note: The official coding period is May 14 - August 14, which is a 13 week period. I've accounted for 12 weeks of work here.

Risk Analysis

There are many gaps in my technical knowledge:

I will try to mitigate this by keeping my mentor updated of my progress, updating this timeline as needed, and asking for help early.

freakboy3742 commented 6 years ago

@patiences This is a really solid first draft - great work!

A couple of suggestions:

Of course, another approach would be to modify VOC to use Java strings in place of a separate string class - which require much less messing around with type conversion, but would mess up the internals a lot. However... I suspect it would also be a huge performance bump. Given that you've found conflicting opinions about whether StackMapFrames are worth the effort, our time might be better served looking to see if modifying VOC internals to use Java primitives rather than classes for Python primitives might be worthwhile. Even a speculative experiment that doesn't end up with production code might be a better use of GSoC time than implementing a bytecode structure that doesn't actually benefit performance.

patiences commented 6 years ago

The constant table in a class file needs to be Java primitives and references; Python strings are instances of a Python string class. So, it won't be possible to intern the Python string instances in the constant table; you'll need a different approach.

Ah, right!

another approach would be to modify VOC to use Java strings in place of a separate string class

Do you mean that python methods that operate on strings (Dict#getitem, for example?) should take java.lang.String strings instead?

Given that you've found conflicting opinions about whether StackMapFrames are worth the effort

Yeah, unfortunately I've seen multiple times that the performance gain was more of a theoretical advantage and has not been realized. :-( But wouldn't implementing StackMapFrames be a good idea since they are no longer optional in Java 8 -- is that a good enough reason?

freakboy3742 commented 6 years ago

Regarding strings: I was thinking that the potential optimisation is to completely remove any concept of an org.python.types.String, and replace them with java.lang.String (and similar with Integer, Boolean, and so on). Essentially, we'd be making a special case of Python primitive types, making them fall back to Java primitives.

The catch is that we'd need to provide all the utility methods that Python provides on it's primitive types (e.g., str.format) as a static method somewhere, and if you ever try to invoke a method on a primitive, it would be redirected to the static method.

It wouldn't be a trivial change, though - at the moment, we can assume every object is an org.python.Object; if we start using Java primitives, that distinction would be lost.

As for whether SMF's are worth adding... that's a good question. It's possibly a good move to still add them; at the moment, we can only generate Java 6 bytecode, and while Java VMs are completely backwards compatible, and J6 bytecode will continue to run, it means that if Java ever adds anything interesting (and they already have in J7 and J8), we're might start to hit problems if our Java source code (for the Python standard library) starts to use those features.

patiences commented 6 years ago

Does this mean that users that write Java extensions for their Python code also have to use Java 6 (I don't know how often this happens)? That seems a bit outdated, I personally can't imagine writing java code without J8 lambdas. If this is the case StackMapFrames would probably be a very good idea.

On the other hand, I think the special-casing Python primitive types would be more interesting work for me. :-)

freakboy3742 commented 6 years ago

Combining Java6 and Java8 classes in a single executable shouldn't be a problem... but those are always famous last words :-) And, we don't know what is going to be added in Java9 that might cause compatibility problems with Java6. Pinning ourselves to a known-deprecated format isn't a good idea in general.

Honestly, either StackMapFrames or trying to make better use of primitives would be a viable project; StackMapFrames is better defined, but more detailed; primitive use is a lot more exploratory.

patiences commented 6 years ago

@freakboy3742 Thanks for all the insights! :-)

I've updated my proposal to address your comment about getting the benchmarking down in the first week. I've also replaced the work on implementing StackMapFrames with the primitives types optimizations we've chatted about in the comments here, as it is more interesting work for me and will likely have a huge performance impact. I've also added an extra week to the variable lookup work period -- I've been thinking about how to update the jump addresses and I'm still not sure so I've factored in some more time for that.

Let me know what you think!

freakboy3742 commented 6 years ago

This is looking pretty good - certainly good enough to submit as a final proposal.

freakboy3742 commented 6 years ago

This project was selected for the 2018 GSoC.

patiences commented 6 years ago

At the end of week 1, here's what's been happening...

  1. Started working on small integer preallocation. After meeting with Russ on Wednesday afternoon and getting answers to my questions, had enough information to come up with a WIP commit (https://github.com/patiences/voc/commit/22f506e1be073bdb00948a734364c66c71f34f69) but unit tests currently with a cryptic Java bytecode-related error.

  2. Started working on writing an initial performance test suite, on which I can expand throughout GSoC. Looked at PyBench, PyStone and the performance and perf (https://perf.readthedocs.io/en/latest/) modules, which I'd like to try to use. Would love some feedback generally on whether or not it's a good idea to use the perf library, or to build an internal benchmarking/timing mechanism. WIP commit here: https://github.com/patiences/voc/commit/e385557954dc1381ae83b514022cc911e31b28c1. Weirdly enough running this test also gives me a stack map related error, which I need to investigate.

  3. Having trouble running ASM, which I need to resolve ASAP to move forward. :) Edit: Thanks Russ!

patiences commented 6 years ago

Week 2 finished! I've been working on:

  1. Small integer and boolean pre-allocation. Code here and here. I am currently investigating the effect on in-place operations, once that is fixed, should be ready for PR.

  2. Performance benchmarking test suite, code here. Should be ready for PR soon as well.

No blockers, and plenty of work to do coming into week 3! :-)

patiences commented 6 years ago

End of week 3 update:

  1. Made progress on small integer and boolean preallocation by further restricting Integer and Boolean construction in the codebase on the Java side rather than just the transpilation side. This should greatly improve performance of Integers and Booleans that are created as a result of executing the code (range, for example), and speaking of which:

  2. I've been working a lot on how to demonstrate the performance gain of small integer and boolean preallocation. I realised that I wasn't properly measuring the execution time which was causing a lot of instability in the tests but I now have a system. :-) Will have results to share early next week.

Hoping to wrap up these 3 PRs this coming week, and starting work optimizing global loads!

patiences commented 6 years ago

End of week 4 update:

  1. Boolean optimization is merged and small integer optimization and the benchmarking test are close. Hooray! These optimizations actually have created considerable performance improvement as can be seen by the performance tests results which is super exciting.

  2. At the end of the week I started working on optimizing global loads (https://github.com/pybee/voc/pull/833), and will be continuing to work on that this week.

Thanks @freakboy3742 for spending extra time with me this week, even finding time to pair program! :-)

patiences commented 6 years ago

End of week 5 update (sorry late!):

  1. Small integer optimization and the benchmarking test are merged! Yay! The integer optimization definitely gave us a performance boost (though not as drastic as the boolean optimization), and the test suite provides a way to detect changes that affect performance.

  2. This week has been all about working on optimizing global loads. I opened a number of pull requests experimenting with different types of caching, and after meeting with Russ, decided to focus on trying to cache individual modules. Unfortunately I didn't manage to get pystone working (trying to run it through voc results in a org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 3882: Incompatible stack heights), I may look at that in a future week. This week my goal is to get one optimization working.

patiences commented 6 years ago

End of week 6!

  1. I've spent the past week on https://github.com/pybee/voc/pull/839, an optimization for global loads that caches Module objects since those get looked up a lot. There are some pretty neat performance improvements (in the PR) already. The last piece is getting it working for generator functions, and I'm hoping to close this next week.
  2. I've also started working on https://github.com/pybee/voc/pull/845, which is an optimization for function calls that take no arguments. The code is all there, but I'm having trouble writing a performance test for this one, so I'll be working on this early next week as well.

Next week I'll look at getting pystone working as well, as it would be super beneficial, and perhaps doing some profiling. On another note, I've written a blog post about my first month working on VOC, check it out if interested :-)

patiences commented 6 years ago

End of week 7 update:

  1. This week I spent a lot of time on https://github.com/pybee/voc/pull/839, trying to optimize GeneratorFunctions and I think I've found out why it's not working/why it's not a good idea. I'm going to spend just a little more time this week wrapping this up, and then it'll be ready for review.

  2. I've been working on writing a performance test for https://github.com/pybee/voc/pull/845 to show the performance improvement, but I haven't been able to write such a test. I think this is because while there are savings any time we can create fewer objects, the extensive null checks I added to support passing null as an argument on the java side perhaps cancels out any improvement. So it might be the case that there is no actual performance improvement. I'll check with Russ next week to see what we want to do with this PR.

  3. This week I was also able to get pystone running which revealed a bug (fix here). These PRs should be merged soon, and I'll be taking a closer look at the pystone tests to see if we can identify some more candidates for performance optimization there.

patiences commented 6 years ago

End of week 8 update (late because I was away Jul 5-13):

  1. Hooked up pystone PR! The difference on my machine between the beginning of May (before I started working on my GSoC project) and on master now is about 2.8x faster (from ~500 pystones/sec to ~1400 pystones/sec) which is crazy!

  2. Merged https://github.com/pybee/voc/pull/852 -- a bug fix (mentioned above).

  3. Merged https://github.com/pybee/voc/pull/869 -- an optimization that comes from not creating __code__ attributes on functions. An issue to track __code__ inspection in the future is here.

  4. Almost there with https://github.com/pybee/voc/pull/864 -- this one has been dragging because I've had so much trouble showing performance improvement by benchmarking. It turns out that this is because setting up for a function invocation (the site of this optimization) is nothing compared to how much time the function invocation actually takes. I've verified that the generated bytecode is smaller, so there is a little more cleaning up to do and then it will be ready for review.

I've written another blog post about the module optimization from this past month (https://github.com/pybee/voc/pull/839), if anyone wants to know more about that. :-)

patiences commented 6 years ago

End of week 9:

  1. Merged https://github.com/pybee/voc/pull/875. This is an optimization that targets comparison operations (like >, ==, etc) due to org.python.types.Methods (unneccessarily) being created for each operation. I'm happy to say that it's resulted in an huge performance improvement on pystone!

  2. There's a bit more work to do on https://github.com/pybee/voc/pull/877, a few small bugs that came out of working on #875 above. I'll wrap this up early next week.

Next week, I'll continue working through some potential optimizations based on results I've gotten from JProfile, so there's plenty more work to do!

patiences commented 6 years ago

End of week 10 update:

  1. Merged https://github.com/pybee/voc/pull/881. This is an optimization targeting nested loops, by reusing the same instance of StopIterations. This does however introduce some behavioral differences between VOC and CPython, documented here: https://voc.readthedocs.io/en/latest/reference/differences.html

  2. This past week I've been profiling small pieces of code that have exposed a few small bugs, including https://github.com/pybee/voc/issues/882, https://github.com/pybee/voc/issues/885, https://github.com/pybee/voc/issues/886, https://github.com/pybee/voc/issues/888, https://github.com/pybee/voc/issues/890.

  3. Open PRs include 1) https://github.com/pybee/voc/pull/877, which is some bug fixes and refactoring around __contains__/__not_contains__. 2) An optimization targeting __dict__ creation which needs a few kinks worked out, and 3) the beginning of some work on reducing the number of org/python/types/Str objects created (https://github.com/pybee/voc/pull/892).

patiences commented 6 years ago

End of week 11 update:

  1. Merged https://github.com/pybee/voc/pull/893. This was supposed to be just a cleanup of org/python/types/Str, but it turned out to be a decent optimization especially for the print function. (See the PR for benchmarking results)

  2. Merged https://github.com/pybee/voc/pull/899. This is an optimization for org/python/types/Dict operations, which are ubiquitous in Python and also in the Pystone benchmark (we saw an amazing performance improvement on Pystone with this one, details in the PR).

  3. Decided not to continue pursuing https://github.com/pybee/voc/pull/880, because it wasn't showing performance improvement and I was a bit mistaken about the way that Java initializes HashMaps.

  4. Logged a few more bugs (https://github.com/pybee/voc/pull/889, https://github.com/pybee/voc/pull/904) and a quick refactoring (https://github.com/pybee/voc/pull/894).

This coming Tuesday is the official cutoff for GSoC, but I will continue working for about a week after. I'm currently working on a blog post for the website, and wrapping up a few more PRs this coming week.

patiences commented 6 years ago

End of week 12 update:

  1. Merged https://github.com/pybee/voc/pull/910. This is the same optimization as https://github.com/pybee/voc/pull/899, for Dict.__setitem__().

  2. Still working on https://github.com/pybee/voc/pull/902, which still needs some cleaning up. The optimization is about creating standard org/python/Object functions at demand, not at module initialization time, which speeds up our initialization time.

  3. Started working on https://github.com/pybee/voc/pull/909 , which is about method caching. Method calls are currently very expensive in VOC because each time we call a method, we create it anew. This optimization yields a significant performance boost, but we are still figuring out how to handle __dict__ inspection if method objects are cached there.

  4. Started working on https://github.com/pybee/voc/pull/905 which is about adding a python base object test suite. I'm having a lot of trouble with getting the tests passing, it looks like there are a lot of bugs around this (some of which, if not too hairy, I'm also addressing in the PR). Still more work to be done for this.

  5. https://github.com/pybee/voc/pull/911 is ready for review -- it's a clean up of (what seems to be) legacy code that creates unnecessary objects.

  6. Started working on https://github.com/pybee/voc/pull/913, which is part of a bug fix for https://github.com/pybee/voc/issues/907 and https://github.com/pybee/voc/issues/912.

I'm also currently working on hooking up a few more benchmarks from http://speed.pypy.org/comparison/ (which is where I'm finding all these new bugs!). I will try to wrap these up early this week if I can as I am planning on working as part of GSoC until Wednesday (but will be around to finish up anything that needs a little more time of course).