Closed Keats closed 4 years ago
might look at https://github.com/jameslittle230/stork
i only recommend it because it looks cool and #rust
but it also seems promising!!
Sent with GitHawk
It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.
— Tantivy
Stork is built with Rust, and the Javascript library uses WebAssembly behind the scenes. It’s built with content creators in mind, in that it requires little-to-no code to get started and can be extended deeply. It’s perfect for JAMstack sites and personal blogs, but can be used wherever you need a search interface.
— Stork
Based on the goals of the projects, I feel like stork might be more user-friendly for static sites, though Tantivy might offer better opportunity to integrate with Zola +Tera templates?
Idk what people really need in terms of customization though.
Sent with GitHawk
I've received an email of someone having a fork with Tantivy for search with Zola but I don't have their GH handle if they have one :(
To test which search engine/lib to use, the main thing to test is how good are the results. I don't really care about whether it's in Rust or JS like right now, as long as the results are decent and that you can do a usable search input with a little bit of JS
Does there have to be only option? Could a user choose which one through a flag, config.toml, or even a build feature?
Tantivy and stork look like they have two different but very valid use cases for a workflow that includes zola.
Tantity requires a backend so the output becomes something different from a static site. Zola is a SSG so it won't get an official option that is not completely static.
Could zola support a generic index building? With Tantivy you can load schema from json so if zola could take a config file or option where a user can describe what information you want to be indexed and output a json with that information that would be super helpful. Similarly stork asks you to store that information in a toml file and stork will build and index from that information. It would be nice if you could also optionally ask zola to run some command after it indexes on build as well.
Stork asks that you run . . .
stork --build index.toml
to build the index. Obviously since stork is written in rust you could pull in some of it's source code to build it as a part of zola but I think having the ability to run a single shell command would be more flexible for a wider array of search engines.
Another request: https://github.com/getzola/zola/issues/975
I made a fork that is based of off zola 10.1 with the changes made by Jonathon Strong here that allow zola to build a tantivy index.
It adds a new subcommand index
that takes a n index type -t
that can be either elasticlunr or tantivy and optionally an output directory for the index (defaults to public
Here is the output of zola index -h
Create a search index as a stand-alone task, and with additional options
USAGE:
zola-tantivy index [FLAGS] [OPTIONS] --index-type <index_type>
FLAGS:
--drafts Include drafts when loading the site
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-t, --index-type <index_type> what kind of search index to build [possible values: elasticlunr, tantivy]
-o, --output-dir <output_dir> Outputs the generated search index files into the provided dir. Note: Tantivy
indexing produces a directory instead of a file, which will be located at output-
dir/tantivy-index [default: public]
I know @Keats said
Tantity requires a backend so the output becomes something different from a static site. Zola is a SSG so it won't get an official option that is not completely static.
So I was unsure if this pull request would be wanted but I have been using Jonathon Strong's fork for awhile and it feels rude to not send a pull request.
Please let me know what you think.
As a side note I have also made a server, you can check it out here, that can
It uses Actix-web for the web server and Tera for templates so it's pretty easy to make templates based of your zola templates.
For anyone that is self hosting their zola website and wants to use tantivy for their search.
It can be mentioned in the docs but won't be accepted as a PR
Lunr has worked well for me in the past. https://lunrjs.com/ Seems to be maintained still but hard to tell
Zola uses elasticlunr which is based on lunr
Any chance this could be used to build a search system for zola?
https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/
I am not sure, but the preface of that article is maintain a server or download the WHOLE dataset.
In abridge I implemented:
elasticlunr(zola default): https://abridge.netlify.app/
stork: https://jieiku.github.io/abridge-stork/
tinysearch: https://jieiku.github.io/abridge-tinysearch/
Which was discussed here: https://github.com/Jieiku/abridge/issues/41
One user pointed out pagefind. I found the way this one works interesting because it does NOT download the entire dataset, rather it downloads a chunk relevant to your search term, which of coarse on an enormous dataset could save bandwidth.
I bookmarked the sqlite database article you linked because it is interesting, but my blogs dataset is nowhere near large enough to worry about it yet.
I almost forgot, I also made a JS facade for the search, this stops the index from being downloaded until a user clicks into the search box, it was discussed here: https://github.com/Jieiku/abridge/issues/81
I am not sure, but the preface of that article is maintain a server or download the WHOLE dataset.
That's what they are trying to avoid in the article. I believe the method that they use involves directly querying a SQLite database hosted on a static host from the browser. They use a WASM implementation of SQLite with some kind of chunked data access over HTTP to do it. They have live examples in that article where you can query a ~600Mb database with just kilobytes of data transfer.
To quote the article:
The total amount of data in the indicator_search FTS table is around 8 MByte. The above query should only fetch around 70 KiB. You can see how it is constructed here.
Pagefind also does some kind of chunked data access.
https://pagefind.app/ demo: https://xkcd.pagefind.app/
edit, abridge pagefind integration demo: https://abridge-pagefind.pages.dev/
I have no idea if Pagefind or the sqlite method you linked make more efficient use of bandwidth, if somebody implemented both and compared them I would find that a very interesting read.
SurrealDB could be a good option because it can run as wasm in the browser or on a server.
https://github.com/surrealdb/surrealdb.wasm
It has full text search built in. It's a fork of tatty I believe. https://docs.surrealdb.com/docs/reference-guide/full-text-search/
@gedw99 that is seriously cool!
@gedw99 that is seriously cool!
thanks. It's not bad is it. It's got lots more too. AI Vector engine, real time sync, rpc, etc.
See https://zola.discourse.group/t/search-improvement/344/2
We still to discuss what improvements make sense but imagine you have a site with thousands of pages; adding all of that to the search index will result in a huge JS file that is not usable. Being able to select which field would help for example.