getzola / zola

A fast static site generator in a single binary with everything built-in. https://www.getzola.org
https://www.getzola.org
MIT License
14.22k stars 970 forks source link

Alternative Search Engines, elasticlunr last updated 2019 #1849

Open Jieiku opened 2 years ago

Jieiku commented 2 years ago

I just finished implementing the elasticlunr search for the abridge theme.

So far I like elasticlunr, but it was last updated 2019.

It would be really cool if Zola supported multiple search engines.

I am going to attempt implementing stork and tinysearch

Stork seems to provide VERY relevant results and shows suggestions based on partial words which is nice!

Keats commented 2 years ago

stork and tinysearch look very nice! Thanks for finding those. Looking at those 2 though, they only support western languages and not CJK? I don't see specific stemmers/stopword list for various languages so they might only work well with English?

It's not a problem if the elasticlunr hasn't been updated since 2019 if it works well.

Jieiku commented 2 years ago

@Keats If I can add stork and tinysearch index generation to zola, is that something you would accept pull requests for so long as it works without causing any issues for the existing code base?

I understand these two don't have as many features as elasticlunr for CJK, however if zola supported the index generation for these, then other developers more familiar with those languages (CJK) would probably submit improvements to those projects (stork and tinysearch)

Keats commented 2 years ago

Both stork and tinysearch only support English so I would say their scope is too limited for Zola. IMO if they don't support CJK (fair enough) they should at least support most languages using spaces like Italian/German/Dutch/French/Spanish/Portuguese and scandinavian/slavic languages. It's pretty much just adding stopwords for each which are easy to find, no reasons not to support them.

As for adding alternatives to Zola, before doing so it would need a few things:

Alternatives will be added if they are giving compelling results in the above, elasticlunr.js might not be maintained anymore but it still works perfectly fine and is not going to suddenly decay.

Jieiku commented 2 years ago

Ok, very cool! I am not sure when I will have time to work on this, but when I do I will be sure to make some benchmarks and put some samples and test sites available online.

Jieiku commented 2 years ago

I have been working on this, I still need to find some example content to use for the sample/demo sites.

Keats commented 1 year ago

The ability to add multiple formats has been added in the next branch.

andreacfromtheapp commented 1 year ago

hi @Jieiku and @Keats :)

I'm using Zola for all my personal sites and I love it. The one I'm building now needs search and I'm not a dev. I search the web far and wide and stumbled upon abridge theme and noticed Stork. Searched the issues to suggest it, but @Jieiku did already :)

Stork looks like a match made in heaven for Zola. I hope they implement more languages, because it would be an awesome addition to Zola!

Jieiku commented 1 year ago

I have all three implemented in the Zola Theme Abridge, all three work, I am planning to switch them all to the json format soon though because its a cleaner implementation. (currently stork and tinysearch indexes get included in the sitemap https://github.com/tinysearch/tinysearch/issues/166)

elasticlunr demo: https://abridge.netlify.app/

tinysearch demo: https://jieiku.github.io/abridge-tinysearch/

stork demo: https://jieiku.github.io/abridge-stork/

If you take the time to compare these demos you will see that:

(I need to test with a larger example dataset, as discussed previously in this thread)

Search features:

tinysearch discussion: https://github.com/tinysearch/tinysearch/issues/157

stork discussion: https://github.com/jameslittle230/stork/discussions/294

abridge search discussion: https://github.com/Jieiku/abridge/issues/41 (here you will see pagefind mentioned, I planned to also implement this one https://github.com/cloudcannon/pagefind)

I was able to manipulate the output format of Tinysearch much easier than Stork. (within the javascript) that is why the results look very similar in Abridge when comparing elasticlunr and tinysearch, you will notice the stork results have a bit different formatting. Once I seen that stork was much heavier than elasticlunr I no longer consider using it for any of my own sites. Tinysearch is however much lighter than elasticlunr, and on very large sites pagefind could end up being the very lightest.

Side Note: In Abridge I implemented a search Facade, All the search related javascript is bundled into a single file, and none of it gets loaded until the user clicks into the search box, this saves bandwidth for users that find your article through google, not everyone that visits your site will use the search box. However making use of the search facade is not friendly to non developers, I outlined it all in the readme, but it is a bit technical to make use of that feature.

Zola does not have any built in javascript bundling capabilities, here is what I do for the search facade:

(I use netlify, and I am able to automate all of this through the netlify.toml file)

andreacfromtheapp commented 1 year ago

Hi @Jieiku :)

thanks for your kind and detailed reply! Perhaps I was too implicit: your Abridged theme is how I found out about Stork in the first place!

I also tried copypasta/replicating your code to integrate Stork on my site ( https://gentlewashrecords.com/ ) to no avail. Search is the only thing missing and I suuuuuuuck at js. I do know enough Elm and find FP feels more natural to me. So I set my mind that I will implement the search with https://github.com/rluiten/elm-text-search -- if I manage to. I miss coding in Elm and this would be nice too. Thinking of doing the same in js gives me shivers.

EtoDemerzel0427 commented 1 year ago

stork and tinysearch look very nice! Thanks for finding those. Looking at those 2 though, they only support western languages and not CJK? I don't see specific stemmers/stopword list for various languages so they might only work well with English?

It's not a problem if the elasticlunr hasn't been updated since 2019 if it works well.

@Keats But does elasticlunr really support CJK? For example, I noticed that lunr.zh.js was first introduced to lunr-language in Version 1.5.0, which was in Jun. 2021, and that has been incompatible with elasticlunr already. I do suggest changing to another default search library that is still under maintenance.