Open Timosav opened 2 years ago
Hi @Timosav!
At the moment, I'm using pagefind. This library emits a .js file that is quite simple compared to the search-ui bundle, so I managed to simply overwrite part of it.
Thank you for the feedback and letting me know of Pagefind! I was slowly getting unsure if splitting index chunks was something that would be broadly useful π and was pivoting to focus on more typical use cases, its really great to see a validation of this idea. π
How difficult and useful do you think this kind of feature could be?
Imo there are 2 aspects, please let me know your thoughts!
listItemRender
to setup their own event handlers. These handlers would for example trigger some global event bus that frontend components (React/Vue/etc.) would subscribe to, to programatically trigger the client side navigation (e.g. history.push(..)
). Not the most ergonomic thing as you mentioned since it requires overriding that lengthy API to achieve this simple task.// App.tsx or similar
useEffect(() => {
initMorsels({
uiOptions: {
onOptionSelect: (anchorEl: HTMLElement, ev?: MouseEvent) => { ... do your navigation here ... }
}
})
}, [])
free(search)
in terms of pagefind's docs) due to the use of WebWorkers.
To give a fuller idea of how Morsels' API would look like:
// searcherOptions is equivalent to the one provided to initMorsels
const searcher = new Searcher(searcherOptions);
// Setup is async and mostly proceeds in the WebWorker,
// You could use this to show initialising states in the UI for example
searcher.setupPromise.then(() => {});
// Equivalent of Pagefind's const search = await pagefind.search("static");
const query = await searcher.getQuery(queryString);
// Equivalent of Pagefind's await search.results[0].data();
const results = await query.getNextN(resultsToLoad);
// An array of [fieldName, fieldText] pairs as indexed in the document order
// This is unlike pagefind's singular 'content' field.
// This "positioned" model enables the content hierarchy you see in the UI currently:
// Title > heading > text under heading
const documentText: [string, string][] = results[0].getFields();
// Memory management is required due to WebWorkers
query.free(); // once done with this query
Hi @ang-zeyu
Thank you for the long answer!
I've been digging a lot for this kind of library ^^' If pagefind can be of any inspiration to you, this is great π
Of course, I also had a look at lunr.js, but managing the index, and the storage of the documents themselves to retrieve the text surrounding the matched text was a no-go.
Regarding your propositions:
initMorsels
. However, not the most ergonomic, as you say π Regarding Morsel's API: this looks absolutely great! π₯ Having a full access to the searcher in such an easy way would be great!
As you already did all the Searcher's logic in TypeScript, it seems to me that it would be the easiest to implement?
If you also believe that this is a good choice, how could I help you?
As a side note, to give you a better idea of how I would plan to use this:
out
dir, so that I can publish the static website directly.Thank you for letting me know your thoughts and use case! I suppose both options are OK from a usability standpoint, it wouldn't be too much to expose both options on my end as well. π
Of course, I also had a look at lunr.js, but managing the index, and the storage of the documents themselves to retrieve the text surrounding the matched text was a no-go.
Regarding the Search API though, the .getFields()
call above does not return "closeby" text or do text highlighting unlike Pagefind's API, the UI handles this.
Are you looking for such in-built handling as well?
To give you a fuller idea, this is how the .getFields()
output might look like with the default field mappings. After retrieving this, there is some amount of work involved to extract headingLink-heading-body triplets, and the "best" triplets:
[
['title', 'README'],
['headingLink', 'description'],
['heading', 'Description'],
['body', 'Morsels is a client-side search solution made for static sites, depending on a pre-built index generated by a CLI tool. Some similar tools in this space are Stork and TinySearch. Morsels does the same, with a focus on providing a more feature rich and relevant search experience, while remaining easy to get started with for common use cases (e.g. single domain static sites).'],
... more headingLink, heading, body fields ...
]
I'm thinking that the API for this might be something like this, which should provide a reasonable tradeoff between UI flexibility and API usability:
interface MatchResult {
type: 'heading' | 'body' | 'heading-and-body',
// These 2 properties are nullable, will only be present if type === 'heading-and-body'.
headingMatch?: MatchResult,
headingLink?: string,
// How many terms were matched in this string?
numTerms: number,
// These 2 properties are for doing manual highlighting if preferred.
bestMatches: { pos: number, len: number }[],
text: string,
// This would be the equivalent of Pagefind's 'excerpt'.
// I'm not sure if Pagefind does it, but ellipses will be added where appropriate as well.
getHighlightedExcerpt: () => '... a b c<mark>d</mark> e f g ... h i j k <mark>l</mark> m n o ...'
}
const excerpts: MatchResult[] = results[0].getExcerpts();
You would have to make some simple UI decisions such as:
MatchResult
types
you want to prioritiseMatchResult
s you want to display per documentIn your opinion, is this an acceptable level of complexity?
Alternatively, I could also support a function that treats all text in the document as 'body' text, and return a singular excerpt
field like Pagefind, but you would not be able to present the title > heading > text content hierarchy in Morsels' UI currently.
typed TypeScript
This would definitely be nice! I think it would come in a @types/Morsels
package of sorts, but I think not a big initial priority.
Exposing the implementation source itself as a regular npm package that you can import statically with your bundler is unfortunately not easy though, as there is wasm involved (build toolchains, different bundlers handle wasm differently, etc. etc. π).
It would have to come in a separate <script>
like the current setup or ESM (cmiiw) like Pagefind does it.
Great that I mentioned lunr.js π
I guess that exposing an API such as the one you mention is good. User that do no require an excerpt would not need to use it, but those requiring it can access it easily.
In my opinion, this is an acceptable level of complexity, given the information that your search provide. This solution would also fit more use cases.
The only question is: would it make sense for you to attach this logic to the Searcher? If not, we could still provide some utilities or an example of how to display basic excerpts.
Regarding the typed package, I clearly understand that the additionnal dependencies is not attractive. I will try to work on that part if that's ok for you. I never worked on the bundling of packages using wasm, so I'm curious to learn about it.
For the moment, having the Searcher available will be good enough! I will probably use a library such as zod to cast the types of whatever the Searcher is returning.
Awesome! I'm halfway through decoupling the excerpt highlighting functions from the UI and making them available under results[0].xxx
. I think this should be done today in a few hours or tomorrow. (This is released!)
I will try to work on that part if that's ok for you.
That's fine! I revisited this question a few times before but never got to a solution that didn't require bundler-specific setup (e.g. CopyWebpackPlugin
for the wasm binaries). The closest I think would be ESM, but there is still the issue of ensuring the wasm binary is co-located with the bundle properly.
It should also be possible to inline the binary as base64 into the bundle directly, but it would be 20-25% larger, and requires WebAssembly.instantiate
vs instantiateStreaming
. That said I've never benchmarked these penalties in Morsels, it could also be a perfectly acceptable tradeoff if bundler ergonomics / development experience is more critical.
After some investigation on the packaging, I admit it will take quite a long time have a clean infrastructure.
My current solution to use your library with Next.js is to create a component that will initiate the searcher
and share
it in a global atom (thanks nanostores
).
/**
* Component that will load the infisearch searcher.
*/
const LoadSearcher = () => {
const searcher = useStore(searcherAtom);
if (!searcher) {
return (
<Script
src={getHref('/infisearch/assets/search-ui.ascii.bundle.js')}
strategy="afterInteractive"
onLoad={async () => {
// eslint-disable-next-line @typescript-eslint/ban-ts-comment
// @ts-ignore
const newSearcher = new infisearch.Searcher({
url: getHref('/infisearch/'),
}) as Searcher;
await newSearcher.setupPromise;
searcherAtom.set(newSearcher);
}}
/>
);
}
return null;
};
Although I hate type casting, the solution is more than sufficient!
Hello!
First of all, thank you for this great work. I'm interested in client-side search solutions, and this is one of the few that also keeps trace of the latest id in the page.
One of the difficulty I encounter with this library is that I can only initiate the UI with initiateMorsels that is exported in the bundle file (search-ui.*.bundle.js). As I am currently using Next JS (with TypeScript) to build the website, it would be very convenient to have access to a typescript client, installable from npm or yarn.
Currently, the only way I see to update the UI would be to redefine the types mentioned on this page. This still makes custom handling of event quite complex to handle (e.g. client-side routing).
At the moment, I'm using pagefind. This library emits a .js file that is quite simple compared to the search-ui bundle, so I managed to simply overwrite part of it.
However, I'd still be very interested to avoid any overwriting of .js files and directly use a TypeScript client, even though I believe that emitting such a library would make the handling of a WebWorker more difficult for the user.
How difficult and useful do you think this kind of feature could be?