apache / buildstream

BuildStream, the software integration tool
https://buildstream.build/
Apache License 2.0
83 stars 28 forks source link

Potentially remove all those pesky .pyi files #1412

Open BuildStream-Migration-Bot opened 3 years ago

BuildStream-Migration-Bot commented 3 years ago

See original issue on GitLab In GitLab by [Gitlab user @tristanvb] on Dec 2, 2020, 10:07

Apparently, it is possible to encode pep484 type information directly into cython modules

It could be nice to remove all the redundant forward declarations...

Needs investigation, possibly we need to use a mypyc extension for validation in CI.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Dec 2, 2020, 10:11

changed the description

abderrahim commented 2 years ago

@gtristan I tried to look into this: https://github.com/apache/buildstream/pull/1547 ports buildstream._types and buildstream._utils to use the pure-python cython syntax (The former is already in python, the latter needed some porting).

Regarding mypyc, it seems to be an alternative to cython based on mypy rather than an extension for mypy to support checking cython code. We could probably try to use it instead of cython at some point but the drawback is that it can't do things you can't do in pure python (In particular, it seems buildstream._utils is trying to call into cpython internals to raise an exception in a thread).

gtristan commented 2 years ago

@gtristan I tried to look into this: #1547 ports buildstream._types and buildstream._utils to use the pure-python cython syntax (The former is already in python, the latter needed some porting).

Regarding mypyc, it seems to be an alternative to cython based on mypy rather than an extension for mypy to support checking cython code. We could probably try to use it instead of cython at some point but the drawback is that it can't do things you can't do in pure python (In particular, it seems buildstream._utils is trying to call into cpython internals to raise an exception in a thread).

Thanks for taking a look and clarifying. This issue was mostly a hopeful thought as I came across the referenced URLs and wanted to note these down.

abderrahim commented 2 years ago

I've dug a little bit deeper on this trying to port a bigger module (_node.pyx). Newer Cython looks promising, but there are still some blockers like https://github.com/cython/cython/issues/3957 and https://github.com/cython/cython/issues/4252

gtristan commented 2 years ago

Sorry for lack of feedback here I haven't had time to keep up.

Now that I look closely it looks like there is something called pure python mode in cython. Reading that page, I doubt we want to get rid of the .pyx files in some of the hotter code paths (e.g. _node.pyx and _variables.pyx), as these should get better performance by leveraging cython constructs.

abderrahim commented 2 years ago

There are actually three "pure python" modes in cython:

  1. good old "pure python" code
  2. good old "pure python" + hot code paths rewritten in cython syntax in a separate .pxd file
  3. pure python with special cython decorators

All three can be executed as pure python (3. needs to have cython installed), and can be compiled for performance gain (1. offers modest gains, while 2. and 3. can leverage the full potential of cython). 2. has the obvious downside of having to keep two implementations in sync for the hot paths.

What I was investigating is 3. which should allow us to get rid of the cython syntax without losing the other advantages of cython. Currentyl the only missing feature I noticed is cdef enum (https://github.com/cython/cython/issues/4252). Then there is the huge bug that shows that nobody is currently using this in production (https://github.com/cython/cython/issues/3957).

What I'm currently investigating is 4. mypyc. mypyc is an alternative to cython. It is used by two tools we depend on (mypy and black) so it's used in production (even though it is still considered alpha software). mypyc compiles (a growing subset of) real pure python with type annotations. It is supposed to get a modest speedup when compiling code without type annotations, and a good speedup when adding type annotations.

The upside is that we'd end up with real pure python that can be executed without a compilation step during e.g. debug-edit cycle, and potentially compile the whole tool (with heavy type annotations on the hot paths) on release.

Currently, trying to port _node.pyx, I'm struggling mainly with the fact that mypy takes None-ability very seriously. So if something can be None, it should be checked before being used.

I think there is a buildstream benchmark somewhere. It would be nice to use it to test this.