faster-cpython / ideas

1.69k stars 49 forks source link

Alternative syntax to make optimization and experimentation easier? #453

Closed matthiasgoergens closed 6 months ago

matthiasgoergens commented 2 years ago

In the spirit of bringing up not fully formed ideas for discussion:

Python the language is defined via its syntax. But as far as I know, CPython translate everything to bytecode fairly early on, and then doesn't care too much about the surface level syntax. (Apart from error messages etc.)

I also know that from __future__ import <something> can change that syntax for specific modules.

Would it be possible to open up that mechanism more, so that it's available for experiments?

Just for fun, as a really simple and naive idea, it would be interesting to try writing Python with S-Expression syntax. (Like Scheme. This wouldn't help with performance at all. Just a fun example.)

More seriously, it would be interesting to introduce a syntactic form that allows the compiler to eg re-order statements within a specifically marked block.

Or you might want to forbid certain constructs in certain places, to enable certain optimizations to be done safely.

In any case, my question is: is my understanding right, that just swapping out how bytecode gets produced for specific files would be enough to enable this? And how general is the existing from __future__ import machinery?

markshannon commented 2 years ago

We aren't interested in changing the language, except for very minor clarifications, and fixing loopholes.

How would using S-expressions help performance?

matthiasgoergens commented 2 years ago

Sorry, forget about S-Expressions, they were just a silly example.

What I had in mind was something closer to:

from __syntax__ import strict_checking, idempotent, pure, reorder

@strict_checking
def h():
  some_code()
  some_more_code()

def f():
  # Allow the compiler to re-order stuff in this block.
  with reorder:
    x = foo()
    y = bar()
  # But no re-ordering afterwards:
  return fnord(x, y)

class SomeDataStructure:
  @idempotent
  def set_and_get(self, i, x):
    """Compiler is allowed to cache this function, or to execute it again multiple times, if that's cheaper."""
    self.foo[i] = x
    return self.some_other_expensive_computation()

@pure
def g():
  """Compiler is allowed (but not required) to defer computation of y.
  Or ever skipping computation of g() completely, if its return value is never used.
  Alternatively, the compiler is also allowed to call g() multiple times, if that's cheaper than caching.
  """
  x = some_expensive_computation()
  y = some_other_expensive_computation()
  if x > 3:
    return y
  else:
    return None

In a vanilla implementation, reorder, idempotent, strict_checking and pure etc are allowed to do nothing. An optimizing implementation can make extra assumptions or make use of the declared properties (but has to behave as-if it's the vanilla implementation.)

You can get some of what idempotent would be doing via the existing lru_cache. But not everything.

I mostly had this kind of thing in mind for experiments, but also to be able to write more of the standard library in a restricted subset of vanilla Python that we can hopefully run much faster.

(If that restricted subset never gets exposed to users, then it's purely a performance improvement. But I can imagine, some adventurous users might be keen.)

Also a possibility, similar to inline assembly:

from __syntax__ import bytecode

@bytecode
def myfunc(alist)
  2           0 LOAD_GLOBAL              0 (len)
              2 LOAD_FAST                0 (alist)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE

Or you can add an unsafe mode that disables eg bounds checking for a specific block, in order to speed things up.

I am not sure whether any of these are good ideas, nor how the syntax should look like.

czardoz commented 2 years ago

This proposal sounds quite similar to the approach Cinder uses for Static Python. For example, this is a faster way of looking up the length of a list:

import __static__
from typing import List

def test(l: List[int]) -> int:
  return len(l)

In this case, test contains Cinder's FAST_LEN bytecode, which simply looks up ob_size in the interpreter (quite a lot faster than going through builtins.len(). Explorer Link

The downside of this approach is that it requires someone to knowingly write code to take advantage of these extension features (i.e, it doesn't speed up existing code).

matthiasgoergens commented 2 years ago

Thanks for pointing to the prior work! I'll investigate.

Yes, being able to mix and match something like Static Python is exactly one of the use-cases I had in mind for this kind of flexibility.

The downside of this approach is that it requires someone to knowingly write code to take advantage of these extension features (i.e, it doesn't speed up existing code).

I was thinking of using this approach in libraries (including the standard library). That would speed up existing code that makes use of these libraries.

brandtbucher commented 2 years ago

Also a possibility, similar to inline assembly:

You might be interested in a small project of mine that allows you to do this: https://github.com/brandtbucher/hax.

3.11 support isn’t complete yet, but other versions work fine.

matthiasgoergens commented 2 years ago

https://news.ycombinator.com/item?id=32785012 has some interesting application for something like this.

And by 'interesting' I mean horrifying.