CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.22k stars 97 forks source link

feature request: Python wrapper package, Python API #634

Open SKalt opened 3 weeks ago

SKalt commented 3 weeks ago

I'm building a documentation site using mkdocs, a python-based static site generator. Adding a python wrapper package would let me install the pagefind CLI without a node.js toolchain or a script to download and validate the pre-compiled binary. A Python API would also be useful for writing plugins for mkdocs and other existing python static site generators.

I'd be happy to contribute the python wrapper package and API bindings to this repo. Alternately, I'd also be happy to create a separate repo for managing the python wrapper package if you don't want to increase the maintenance burden here.

bglw commented 3 weeks ago

πŸ‘‹

There's precedence for both approaches. The Node wrapper is maintained in this repo, while there exists Nix and Juliet wrappers maintained externally.

I'm not a heavy Python user myself, so I'm slightly hesitant to take on maintenance β€”Β however I think maintaining it in this repo would be the best result for users, so we should go with that. I've been wanting to offer a distribution channel for users without Node, and Python seems like a good avenue (plus I'd love to have it available in mkdocs, which I'm a big fan of).

One thing I would like to avoid is having the wrapper package download binaries from GitHub actions, since that has caused issues/slowdowns in the past. Is it possible to bundle a platform-specific binary into a python wrapper in a similar manner to the current npm optionalDependencies?

If you're happy contributing the code, and helping me get automated publishing set up, then it would be much appreciated! πŸ™‚ Let me know what I can do to help you.

SKalt commented 3 weeks ago

Looks like I should be able to crib off of https://github.com/astral-sh/ruff/blob/main/.github/workflows/release.yaml, which uses maturin (part of the Py03 project) to shove all the relevant binaries and files into a single python package.

This is a long shot, but would you be open to using goreleaser for non-node packaging? On one hand, with a few lines of hacks GoReleaser can invoke cargo to build the binary and then repackage the binary for many non-node distribution formats including deb, rpm, HomeBrew, Scoop, etc. On the other hand, GoReleaser probably won't have first-class support for Rust for the foreseeable future. There's also a rust-releaser project that seems interesting, but it's not as mature as GoReleaser.

SKalt commented 2 weeks ago

Update: my OSS time at work has dried up for the moment; I'm not sure when I'll get back to this.

bglw commented 2 weeks ago

No worries! Some notes for when you do, or for anyone else who comes across this issue and wants to take it on:

I don't mind what tool is used to actually release the Python package, but it doesn't need to worry about actually building Pagefind, so it doesn't need any level or support for (or knowledge of) Rust.

The publishing will slot into the existing action alongside the publish-binary-npm-packages step, which get given all built assets to download:

https://github.com/CloudCannon/pagefind/blob/38f90542921b405f5cadabd0110a483371581934/.github/workflows/release.yml#L177-L180

So the starting point will have the built executables for every platform, and the checksums for each executable :) All it needs to do it package the files up and publish it.

SKalt commented 1 week ago

Hey, I had some time and took a look at the node.js wrapper. Would you mind passing pages/other content by-referneve over FFI rather than the service rpc to avoid serialization/deserialization overhead?

bglw commented 1 week ago

I'd hesitate to move things to FFI; I don't want it to be a requirement for creating a wrapper package, as the current "external" communication means you can wrap it with anything that can read/write to stdio, or indeed it could talk over a network.

If the overhead does become a performance bottleneck though, that should be solved.

I based a lot of the concept on esbuild's Go service API, which avoids FFI. Notably, over a certain message size they pass content via temporary files which is quicker and has a lower de-serialization overhead compared to the stdio messaging. Pagefind doesn't do this yet, so trying out a similar idea would be the first port of call if we need to improve things. I haven't seen any discussion around their wrapper performance being a problem so I have some confidence this will be good enough for us too.

But, if there was still too much overhead with that approach and it was impacting an actual use-case, I'd be open to looking into FFI πŸ™‚ (I would want to keep the current workflow though, so it would mean maintaining two service APIs)

SKalt commented 1 week ago

That makes sense, and avoiding FFI will certainly speed up implementation.

On a different topic, have you considered putting together an ARCHITECTURE.md for this repo? I'd like to hear more about the trade-offs/decisions that got pagefind to its current state.

bglw commented 1 week ago

It's a good idea! (and I do swoon over matklads posts πŸ˜„)

If I can find the perfect task to procrastinate from by writing one, it just might happen πŸ˜‰