frederickjansen / polyline

A Python implementation of Google's Encoded Polyline Algorithm Format.
http://goo.gl/PvXf8Y
MIT License
114 stars 19 forks source link

Up to 70x faster computation #14

Closed BuonOmo closed 9 months ago

BuonOmo commented 1 year ago

Hi,

I'm the author of fast-polylines gem in ruby and in crystal. I've recently created an equivalent project in python, that takes advantage of C bindings to compute way faster.

I still have some work to do on it, mainly tests and documentation. But rather than publishing a new library, I was thinking that I could merge this into your project. No need for two competing libraries. Great for me: you already have documentation and tests (although I'd add a few tests relative to how it is implemented in C). Great for you: way better perfs.

Here's a quick benchmark I made to showcase efficiency, you can run it locally as well:

git clone git@github.com:buonomo/fast-polyline.py
cd fast-polyline.py
make install
make bench

encode

fast_polyline: 0.583ms
polyline: 39.7ms
fast_polyline is 68.1 times faster.

decode

fast_polyline: 1.53ms
polyline: 25.9ms
fast_polyline is 17.0 times faster.

Note that performances would be improved a bit more before publication, especially on decode.

Also, Since it would change in lot in the codebase, I'd be happy to help maintaining the project afterward and having my email mentionned in the pyproject.toml.

collinr3 commented 9 months ago

@BuonOmo Have you considered creating a pull request that incorporates your c library? It would seem a logical next step that doesn't place the burden on @frederickjansen

Also, consider adding a runtime flag that specifies which code/decode implementation to use ie the original python implementation or the c implementation.

BuonOmo commented 9 months ago

@collinr3 since the author is not responding here I'd rather create a separate lib. Also i feel like have either a c or python lib IS better than a flag for codebase footprint :)

Although if you feel like they might answer better to a pr than an issue, feel free to open it !

frederickjansen commented 9 months ago

I'm not against including a faster C-based implementation, but I am slightly cautious since this library is included in projects like Apache Superset and I want to make sure I don't introduce any dependencies I don't feel comfortable managing myself. Beyond just adding the code it would require automated build and testing jobs for macOS, Windows, and Linux, for various Python versions, x86, x86-64, and ARM targets. I can work on this but it'll take some time. Looks like hosted ARM64 runners are now available on GitHub Actions, but they're not free unfortunately.

collinr3 commented 9 months ago

Thank you for the response @frederickjansen and your caution is prudent. My own instinct is to retain the current implementation as the default behaviour and make the c library an optional 'runtime' choice. That may also help sidestep the ARM64 dependency for now. However, I do defer to you for such decisions.

frederickjansen commented 9 months ago

My preference would be to add pre-built wheels for all platforms, without any hassle for the end user. I would also aim for the same experience for everyone. Those on Linux or macOS might have an easy time building from source, but that's typically not the case for those on Windows. I haven't kept up with the Python landscape for publishing wheels with compiled C code in a while, and if the recommended tools today are cffi or ctypes, Meson or CMake, ... I'll take a look though and figure out how I want to move forward.

BuonOmo commented 9 months ago

Hi @frederickjansen, thank you for the consideration.

As I see it, I’d rather publish a separated lib, at least for now.

My main language not being python, I would not want to dig the rabbit hole of prebuilt wheels for all platform, as I already now it is a pain in ruby. And I wouldn’t either guarantee the tests to work on windows, as I am not owner of a windows machine and I don’t want to waste so much time on this OS.

I also am not in favor of a runtime choice @collinr3. For me if we want to have two implementation, it is better to have them clearly separated, as I don’t see how could a user benefit having both on their laptop/server. Best case scenario, this would confuse people. Worst case, the C installation would bother devs that do not want it.

I am still up to working with this lib in the future, I just feel like it is a good step to publish mine separated for now, see how it is accepted. And once it is used and you are up to handling the prebuilt, we can work on it.

What do you think? (you can close after replying if on board with this plan!)

frederickjansen commented 9 months ago

@BuonOmo I think that's reasonable. At some point it would be nice to merge the two, but it sounds like now is not the time to do so.

eli-schwartz commented 8 months ago

I haven't kept up with the Python landscape for publishing wheels with compiled C code in a while, and if the recommended tools today are cffi or ctypes, Meson or CMake, ... I'll take a look though and figure out how I want to move forward.

setuptools remains an adequate tool for publishing wheels with compiled C code. Both Meson and CMake try to solve scalability and robustness, parallel building of many C files, handling external dependencies on system libraries, support for languages such as Fortran... the proposed single-C-file enhancement has none of these issues.

I wouldn't sweat it.

Beyond just adding the code it would require automated build and testing jobs for macOS, Windows, and Linux, for various Python versions, x86, x86-64, and ARM targets. I can work on this but it'll take some time. Looks like hosted ARM64 runners are now available on GitHub Actions, but they're not free unfortunately.

I think you just want https://cibuildwheel.readthedocs.io/ which by the way also includes a chart of CI providers and architecture support. Note that you only need an actual arm64 runner for Windows (Linux can be handled with emulation, and macOS with universal2 support).