Cython for optimizing certain computations

eliotwrobson commented 1 year ago

Creating this issue because I've been looking at Cython a bit, and wanted to record my thoughts on how it might enhance this project. As a starting point, vanilla Python code can be compiled with Cython and should run faster than just using Python, and this optimization can be further enhanced without having to dive too deep into non-Pythonic constructs. These optimizations should be especially useful for work-intensive loops (such as the refine method on the PartitionRefinement data structure). So adding Cython should substantially benefit this library without having to rewrite large amounts of code. Based on some previous profiling done in #95, there are a few intensive computations we could target and extract into a Cython file, leaving everything else as Python.

The main obstacle is that Cython needs to be complied, and this requires adding some configuration to the build step (and may also require reconfiguring the way tests get run). I found an example here and here. We should be able to keep everything really simple to start (we only really need a single file that will contain helper functions for expensive computations used by the DFA class).

I think this is probably too much for the v8 release (as that's just waiting on finishing up testing and implementation of one more feature), but it would be cool to target for v9. Aside from setting up the project properly, I don't think there should be too much work involved, as this would just be converting code that we have already with some optimizations.

Any thoughts? @caleb531 (cc @Tagl, as you did the original profiling)

Tagl commented 1 year ago

I've never used Cython myself so I'm unaware how much of a difference it makes. I fully support optimizing however we can and from profiling I could see the refine method being the main bottleneck in my use case.

eliotwrobson commented 1 year ago

Got a very minimal version of this working with light optimizations (based on length of translated code). Haven't benchmarked but seems to be faster: https://github.com/eliotwrobson/automata/tree/cython

It should be very straightforward to compile and play around with (the new .pyx file has compilation instructions).

caleb531 commented 1 year ago

@eliotwrobson I took a peek at the code, and it looks pretty cool! I was always under the impression that C extensions were fully written in C 😅, so it's cool to see that most of the code is syntactically Python.

From a maintenance perspective on our end, I personally don't have a problem with a build step. I would only hope that the building of the project would happen automatically when users install the library via pip (seems like your setup.py would take care of this).

My few questions are:

Why is the file called dfa_worker? Is the idea that other pyx-appropriate data structures could be placed here? I ask this because I don't think dfa_worker belongs in the base directory, otherwise the file should be renamed to something agnostic of any particular automaton type
In your experience, can you simply build this file once (python setup.py build_ext --inplace) before running nose2? And you only need to rebuild when you make changes to the file? Or is there anything that requires you rebuild every time? (I'm guessing the former, but just want to check)

eliotwrobson commented 1 year ago

@caleb531

From a maintenance perspective on our end, I personally don't have a problem with a build step. I would only hope that the building of the project would happen automatically when users install the library via pip (seems like your setup.py would take care of this).

So the current setup.py is suitable mainly as a proof of concept. The build step actually requires installing Cython and a C/C++ compiler, but this is ok because since this can be done before the package is put on PyPI, and then end users can just download what's there. However, this increases the maintenance slightly on our end, as we have to run the build step before running tests, but this isn't so bad. I think we can just run the build command first and everything should work. Importantly, once this is set up, I don't think there's a huge ongoing maintenance burden (we don't even need to write separate tests). There's detailed discussion here: https://levelup.gitconnected.com/how-to-deploy-a-cython-package-to-pypi-8217a6581f09

To your questions:

The name was arbitrary, I was mainly concerned with getting everything runable. However, I think that the format for now will be to write isolated helper functions in the compiled code, rather than full classes, so it makes sense to use a class-agnostic name of some type (though the main uses will be in the DFA class).
From what I know, it only requires rebuilding when changes are made to the complied file. The files output by compilation can be directly called into by Python.

caleb531 commented 1 year ago

@eliotwrobson I myself already have written an rt ("run tests") Bash function which automatically runs nose2 for Python projects. I am curious if a task runner like pypyr would be useful to automatically build before the command is run.

As to your answers to my questions, both of those sound good to me!

eliotwrobson commented 1 year ago

After more reading, it looks like I was mistaken and this actually would require cython to be installed for compilation by users (which is apparently problematic on windows). Not necessarily a deal breaker but probably warrants some more thought than just including right away. Going to leave this issue open but not sure if it should be a v9 target.

Maybe there's a clean way of doing this with optional dependencies? That would be cool.

EDIT: Should probably use this to avoid having to write two separate files:

It will be necessary to set up tox as well: https://stackoverflow.com/questions/74558388/cythonize-package-tox-testing

Also, after glancing at the above posts, apparently you don't even need Cython if you include the generated .c files in the repo, then users installing the package can just compile with whatever compiler they have. Either way, setting up Cython / C extensions as an optional dependency is a substantial amount of work.

Vipul-Cariappa commented 11 months ago

I am going to throw some ideas here:

There are 2 alternatives for Cython, where you need not rewrite the code and can just use the existing code as it is. i.e. need not maintain/learn different syntax.

Nuitka: It is a transpiler which converts python code to c code. Adds some optimizations. I tried it out. Running the tests locally took ~1.8 seconds. With Nuitka it took ~2.0 seconds. So, no gain there.
mypyc: I guess it is also a transpiler. It is in the beta stage. It uses the type hints provided to optimize the code (And we already use type hints in our code base). When I tried it out, it compiled each file into a separate extension and did not combine them into one single extension. Also, the test suit did not run. i.e. 22 tests ran and all 22 tests failed. I did not spend much time debugging.

I can investigate these options if they look viable.

eliotwrobson commented 11 months ago

@Vipul-Cariappa thanks for taking a quick look. I didn't mention it in this issue, but after some offline discussion, we decided it doesn't really make sense to pursue this too hard until someone with a more specialized use case comes along. The reason is related to what you're seeing, there won't be any real performance improvements on the test suite because the test cases are small unit tests to check for correctness, not larger inputs that would be suitable for benchmarking. Since I don't have these use cases myself, and I'm not sure if anyone else does, it becomes hard to drive development of this (especially when there are more important open issues to address).

There are 2 alternatives for Cython, where you need not rewrite the code and can just use the existing code as it is. i.e. need not maintain/learn different syntax.

There actually is a Cython syntax that allows for having files that are both interpreted and optionally compiled: https://cython.readthedocs.io/en/latest/src/tutorial/pure.html

I haven't investigated this heavily for the reasons previously stated, but this seems like a more straightforward direction than using a transpiler. You're welcome to keep exploring this and other options, but our biggest priority for this project right now is actually the documentation rework #160. Because of the scale of the changes required with the Cython extensions, I don't think we'd want to merge a PR for that until #160 is done and we have this project reviewed by PyOpenSci.

caleb531 commented 11 months ago

I would also add that with the recent introduction of Mojo (a superset of Python with C-like performance), a future may be coming where achieving low-level performance may be abstracted away to become more of an implementation detail than one that requires specials syntax on the developer's part (as projects like pypy have also shown). And I think this is the better approach—if people need more performance, then they should use a Python implementation optimized for such.

Therefore, I motion that we close this issue as Not Planned (click the dropdown arrow next to "Close with comment"), if that sounds good to you @eliotwrobson.

eliotwrobson commented 11 months ago

@caleb531 that sounds reasonable. I think you're right that a lot of performance optimization can be handled on the client side. I'm not even opposed to doing some performance optimization here if someone with these use cases is willing to help with development, I just don't have these needs myself. And either way, if we decide to go in that direction, we can just reopen this issue.

With that then, I will go ahead and close this as Not Planned.

caleb531 / automata

Cython for optimizing certain computations #159