Open damithc opened 6 years ago
Raising priority as full-text search can greatly enhance the usefulness of a content-heavy website.
I don't mind full-text search is a separate page altogether and takes some time to load (i.e., if the full search index needs to be downloaded to the Browser first)
Are we open to integrating existing solutions for full-text search?
Docsearch (free, open-source):
DocSearch will crawl your documentation website, push its content to an Algolia index, and allow you to add a dropdown search menu for your users to find relevant content in no time.
Are we open to integrating existing solutions for full-text search?
Ideally, we should have a decent built-in solution and the ability to integrate other third-party solutions.
As discussed with @marvinchin today, Marvin is planning to explore using the Lunrjs library to implement a built-in full text search. This library is also used by MkDocs.
Are we still looking to have built-in full text search for V2? 😅 I'm not sure that I can finish it by the end of the semester.
Are we still looking to have built-in full text search for V2? 😅 I'm not sure that I can finish it by the end of the semester.
Good to have, but not necessary. Same for the FOUC problem. Both have a good-enough workaround but not a full-fledged solution.
I've just published an almost year long project originally motivated by this issue:
It consists of a cli file indexer (integratable by copying the binary similar to what we do for plantuml.jar
), a search library powered by wasm (rust), and search ui (typescript).
It deals with the issue in 2 aspects:
Scalability
I don't mind full-text search is a separate page altogether and takes some time to load (i.e., if the full search index needs to be downloaded to the Browser first)
I found this issue to be common to many static site generators using lunrjs / some other client side search solution; This was my primary motivation in creating this project (see https://github.com/olivernn/lunr.js/issues/222
and discussion here https://github.com/rust-lang/mdBook/issues/51
for example), although it turned out to be a secondary plus in the end.
The primary approach / difference here as such is fragmenting the index into many separate files; At search time, only files needed (by what's searched) are retrieved.
The indexer is also created in rust as such (:star: indexes the entire 2103 site in 0.5s
!).
As well as the search library (wasm using rust).
Alternative js-based implementations were also trialed and tuned for both; The performance differences are significant.
This does mean a relatively larger binary / bundle size (334KB
gzipped wasm file), something I'm still working to improve (the silver lining is that search hopefully isn't the first thing (within 1-2s of page load) users activate)
A complete e2e search solution
Due to minor implications of scalability in the internal design, I also ended up creating an entire search user interface library. To my knowledge there aren't many "complete" (indexer -> search library -> ui) solutions around (barring algolia docsearch which is an entirely different beast).
Haven't really marketed it as I'm still tying up some things (e2e tests, getting windows defender to stop flagging the executables as viruses, some more bugs), but could look into integrating it here sometime 😃.
I've just published an almost year long project originally motivated by this issue:
Nice work @ang-zeyu Let's aim to integrate it to MarkBind in due course.
I'm increasing the priority because Algolia DocSearch is undergoing a major revamp and they haven't been able to provide the search support for our module websites this semester so far. The sooner we reduce reliance on third-party search the better.
If anyone would like to take up this issue, please feel free, I think this would be a rather fun thing to do. The library I mentioned above is more or less ready for use. I am currently just doing a fun infinite loop of "making it better and more marketable" but not actually doing any marketing 🤔😅
I came across several related alternatives as well in the course of doing this as well you can consider. All of them follow a CLI + wasm frontend architecture:
Please don't let my selling here from stop you from exercising your own judgement as well. Feel free to come to your own reasoning, and choice, and post back here. I would love to hear your thoughts.
Some non exhaustive guidelines for implementation:
Hello I've been looking at this issue and one problem I've encountered is how contents in components that are hidden to the user during the initial render (e.g. Panels) are not included in the search results. This is because libraries like Pagefind indexes the content only after the HTML files have been built. This rendering problem is also faced by other plugins like dataTable (@Tim-Siu) and Mermaid (@yiwen101 @LamJiuFong)
This behaviour is also similar to the Algolia DocSearch we use now that automatically adds algolia-no-index
to content hidden by MarkBind's Vue components, causing content hidden in panels to similarly not show up in search results.
With this in mind, I'm just making sure if the behaviour of the results of the full text search we want to implement should include content that are included in panels, or it is ok for them to not show up in the search results
This behaviour is also similar to the Algolia DocSearch we use now that automatically adds
algolia-no-index
to content hidden by MarkBind's Vue components, causing content hidden in panels to similarly not show up in search results.With this in mind, I'm just making sure if the behaviour of the results of the full text search we want to implement should include content that are included in panels, or it is ok for them to not show up in the search results
@jingting1412 I think it is fine (even necessary) to omit content from collapsed panels. But we can index content from expanded-by-default panels, right?
Current: only page titles and specified keywords in the frontmatter appear in search results.
Suggested: also include other content in pages for search results