cobalt-org / cobalt.rs

Static site generator written in Rust
cobalt-org.github.io/
Apache License 2.0
1.38k stars 102 forks source link

[RFC] Support for search engines #610

Open Geobert opened 5 years ago

Geobert commented 5 years ago

Goal

Any search engine (lunr, elasticsearch, lucene) need an index to search in. After trying to apply https://booyaa.wtf/2017/adding-search-to-your-cobalt-site-part-one/ to my own blog, I discover that there is an offline indexation step involving node to call a js script.

Would be nicer to not having this dependency to get search functionality hence this RFC to generate a search index that can be serialized into various search engines format.

Options

JS

Rust

The Rust options don't seem mature enough. lunr.rs can be a base to generate a lunr index though, but it seems abandoned (few features + last commit Oct 2017). elasticlunr-rs is more active (last commit Nov 2018) and has pipeline implemented.

For Toshi and Tantivy, I don't know how to use them in a browser.

Activation

In _cobalt.yml:

site:
  search_engine: lunr // example, can be another search engine that we support. Default value: ~ to deactivate 

Internal index format

Serialization to targeted search engine

Original message

I don't have a clue on how to do that except using an external search engine (duckduckgo, qwant…) once is has crawled the generated web site.

epage commented 5 years ago

@booyaa did it manually combine with lunr https://booyaa.wtf/2017/adding-search-to-your-cobalt-site-part-one/

You can also see the ways its possible in Hugo https://gohugo.io/tools/search/

Geobert commented 5 years ago

Thanks for the pointers. So, if this was to be done by Cobalt, it only needs to generate document index for whatever search engine the user want? Am I right?

epage commented 5 years ago

Correct.

Geobert commented 5 years ago

Does it worth it when seeing booyaa's solution?

epage commented 5 years ago

Unsure. I've not looked enough into this space to have any kind of opinion. For example, if we support it, what all index formats would effectively be required?

A short-term solution for this might be to reach out to @booyaa to see if he wants his blog post featured in cobalt's blog (now that we have one) and in what way would he prefer. My preference would be re-posting the content completely.

booyaa commented 5 years ago

Yeah sure, let’s talk on Gitter to sort out the details! 😁

Ages ago, I wanted to integrate the index building by using a javascript parser crate. I never found anything that "Just Worked" ™️.

A better coder would just look at the spec for creating an index that lunr could indigest and create them natively in Rust.

Geobert commented 5 years ago

After trying to apply booyaa's method, I discover that it's better to have a pre-built index for lunr for performance. But the method given by lunr use node to run the javascript to generate the pre-build index.

@booyaa : what is the lunr_docs for? Your search.liquid has a comment saying it's to enrich the search results, but how so?

I think this issue should be about how do we create an index that can be serialized into different formats for different search engines.

I'm changing this issue to an RFC

Geobert commented 5 years ago

droping this here for bookmarking: https://github.com/valeriansaliou/sonic