CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.23k stars 99 forks source link

Support Search API in Node.js #519

Open Snugug opened 7 months ago

Snugug commented 7 months ago

First, thanks for the great library!

I'd like to be able to use search through the Node.js API. We've got two usecases for this: the first is to allow search through a serverless function instead of requiring it to be done client-side. The second is to enable search-powered site generation features, like related articles.

Thanks!

bglw commented 7 months ago

This would be great! And fundamentally shouldn't be too hard to get off the ground. If anyone coming across this wants to jump ahead of me on an implementation feel free to do so — just jump in this thread.

Some thoughts for now: (for myself, (or anyone else!) )

The pagefind_web crate in this folder does the heavy lifting of the search, and is currently structured to compile to a WASM target. The lib.rs file exports the interface that coupled_search.ts in the pagefind_web_js folder talks to.

Tackling this, I think the best approach will be:

The main new code to be written will be Rust implementations of the file loading logic currently in the coupled_search.ts file, as this file actually fetches (and gunzips) the resources. Instead, we'll need a native version that can load these files from a target folder on disk. Thankfully the logic for which files to load is already in the Rust space.

Since this will be a new crate, it will be a new package rather than piling on to the pagefind package. We have the @pagefind NPM scope too, to perhaps just @pagefind/search — or @pagefind/local-search to be more verbose about the intended target.

Finally, there is some scoping work to do. It makes sense to reuse the configuration setups from the main pagefind crate — so configuration files, environment variables, and a CLI interface. Less clear is how the CLI interface should behave — I'd love thoughts and opinions from anyone who wants this! e.g. a structured (JSON?) output over stdio? Write matching results to a data file? Keen to scope this out further.

For the NodeJS API, we can copy across the service module from the main crate, and we'll need to make a new package within wrappers for the search package. The design of this API should be a lot more straightforward.


Writing that out actually pads out a bit more than I expected — but mostly plumbing work rather than novel code. I dont imagine getting to this on the order of weeks, but I also don't plan to not do it 😅