ang-zeyu / infisearch

Easy and flexible client-side search for static sites
https://infi-search.com
MIT License
41 stars 1 forks source link

TypeScript client #3

Open Timosav opened 2 years ago

Timosav commented 2 years ago

Hello!

First of all, thank you for this great work. I'm interested in client-side search solutions, and this is one of the few that also keeps trace of the latest id in the page.

One of the difficulty I encounter with this library is that I can only initiate the UI with initiateMorsels that is exported in the bundle file (search-ui.*.bundle.js). As I am currently using Next JS (with TypeScript) to build the website, it would be very convenient to have access to a typescript client, installable from npm or yarn.

Currently, the only way I see to update the UI would be to redefine the types mentioned on this page. This still makes custom handling of event quite complex to handle (e.g. client-side routing).

At the moment, I'm using pagefind. This library emits a .js file that is quite simple compared to the search-ui bundle, so I managed to simply overwrite part of it.

However, I'd still be very interested to avoid any overwriting of .js files and directly use a TypeScript client, even though I believe that emitting such a library would make the handling of a WebWorker more difficult for the user.

How difficult and useful do you think this kind of feature could be?

ang-zeyu commented 2 years ago

Hi @Timosav!

At the moment, I'm using pagefind. This library emits a .js file that is quite simple compared to the search-ui bundle, so I managed to simply overwrite part of it.

Thank you for the feedback and letting me know of Pagefind! I was slowly getting unsure if splitting index chunks was something that would be broadly useful πŸ˜‚ and was pivoting to focus on more typical use cases, its really great to see a validation of this idea. πŸŽ‰

How difficult and useful do you think this kind of feature could be?

Imo there are 2 aspects, please let me know your thoughts!

  1. Client side routing - this is definitely worthwhile to support on its own.
    • Currently, I expect that one would override listItemRender to setup their own event handlers. These handlers would for example trigger some global event bus that frontend components (React/Vue/etc.) would subscribe to, to programatically trigger the client side navigation (e.g. history.push(..)). Not the most ergonomic thing as you mentioned since it requires overriding that lengthy API to achieve this simple task.
    • I wonder if supporting a simple option to override the event handler alone would be enough:
      // App.tsx or similar
      useEffect(() => {
      initMorsels({
       uiOptions: {
         onOptionSelect: (anchorEl: HTMLElement, ev?: MouseEvent) => { ... do your navigation here ... }
       }
      })
      }, [])
  2. Search API - just needs to be documented and exposed. It is similar to Pagefind's whereby there would be a "retrieve results" step and "populate data" step. In addition, there would be some memory management required (e.g. calling free(search) in terms of pagefind's docs) due to the use of WebWorkers.
    • Is memory management an acceptable compromise to you? I've looked through pagefind's issues and there isn't much on this, I assume the core idea of 2-steps retrieving/loading is somewhat fine.
  3. Let me know if you are looking for something else completely πŸ™‚
ang-zeyu commented 2 years ago

To give a fuller idea of how Morsels' API would look like:

// searcherOptions is equivalent to the one provided to initMorsels
const searcher = new Searcher(searcherOptions);

// Setup is async and mostly proceeds in the WebWorker,
// You could use this to show initialising states in the UI for example
searcher.setupPromise.then(() => {});

// Equivalent of Pagefind's const search = await pagefind.search("static");
const query = await searcher.getQuery(queryString);

// Equivalent of Pagefind's await search.results[0].data();
const results = await query.getNextN(resultsToLoad);

// An array of [fieldName, fieldText] pairs as indexed in the document order
// This is unlike pagefind's singular 'content' field.
// This "positioned" model enables the content hierarchy you see in the UI currently:
// Title > heading > text under heading
const documentText: [string, string][] = results[0].getFields();

// Memory management is required due to WebWorkers
query.free(); // once done with this query
Timosav commented 2 years ago

Hi @ang-zeyu

Thank you for the long answer!

I've been digging a lot for this kind of library ^^' If pagefind can be of any inspiration to you, this is great πŸ˜„

Of course, I also had a look at lunr.js, but managing the index, and the storage of the documents themselves to retrieve the text surrounding the matched text was a no-go.

Regarding your propositions:

  1. I guess that adding such an option would clearly be enough for my use-case. I would only update the css and this onOptionSelect. It would still improve a lot the experience if I had an easy access to a typed initMorsels. However, not the most ergonomic, as you say πŸ™ˆ
  2. This seems to be exactly what I am looking for! From what I understand from pagefind, they currently don't use WebWorker (see comment), hence they should have no issue of memory management. On my side, memory management, if well documented, seems like a more than acceptable trade-off.

Regarding Morsel's API: this looks absolutely great! πŸ₯‡ Having a full access to the searcher in such an easy way would be great!

As you already did all the Searcher's logic in TypeScript, it seems to me that it would be the easiest to implement?

If you also believe that this is a good choice, how could I help you?

As a side note, to give you a better idea of how I would plan to use this:

  1. Next JS website: initiate the Searcher and re-use a SearchBar component in the application that will handle all the Morsel's API logic.
  2. Export to a static website. (I'm working with an empty or not final index in dev.)
  3. Create the index from the exported website using the CLI.
  4. I create the index in the out dir, so that I can publish the static website directly.
ang-zeyu commented 2 years ago

Thank you for letting me know your thoughts and use case! I suppose both options are OK from a usability standpoint, it wouldn't be too much to expose both options on my end as well. πŸ˜„

Of course, I also had a look at lunr.js, but managing the index, and the storage of the documents themselves to retrieve the text surrounding the matched text was a no-go.

Regarding the Search API though, the .getFields() call above does not return "closeby" text or do text highlighting unlike Pagefind's API, the UI handles this.

Are you looking for such in-built handling as well?

To give you a fuller idea, this is how the .getFields() output might look like with the default field mappings. After retrieving this, there is some amount of work involved to extract headingLink-heading-body triplets, and the "best" triplets:

[
  ['title', 'README'],
  ['headingLink', 'description'],
  ['heading', 'Description'],
  ['body', 'Morsels is a client-side search solution made for static sites, depending on a pre-built index generated by a CLI tool. Some similar tools in this space are Stork and TinySearch. Morsels does the same, with a focus on providing a more feature rich and relevant search experience, while remaining easy to get started with for common use cases (e.g. single domain static sites).'],
  ... more headingLink, heading, body fields ...
]

I'm thinking that the API for this might be something like this, which should provide a reasonable tradeoff between UI flexibility and API usability:

interface MatchResult {
  type: 'heading' | 'body' | 'heading-and-body',

  // These 2 properties are nullable, will only be present if type === 'heading-and-body'. 
  headingMatch?: MatchResult,
  headingLink?: string,

  // How many terms were matched in this string?
  numTerms: number,
  // These 2 properties are for doing manual highlighting if preferred. 
  bestMatches: { pos: number, len: number }[],
  text: string,

  // This would be the equivalent of Pagefind's 'excerpt'.
  // I'm not sure if Pagefind does it, but ellipses will be added where appropriate as well.
  getHighlightedExcerpt: () => '... a b c<mark>d</mark> e f g ... h i j k <mark>l</mark> m n o ...'
}

const excerpts: MatchResult[] = results[0].getExcerpts();

You would have to make some simple UI decisions such as:

In your opinion, is this an acceptable level of complexity?

Alternatively, I could also support a function that treats all text in the document as 'body' text, and return a singular excerpt field like Pagefind, but you would not be able to present the title > heading > text content hierarchy in Morsels' UI currently.

ang-zeyu commented 2 years ago

typed TypeScript

This would definitely be nice! I think it would come in a @types/Morsels package of sorts, but I think not a big initial priority.

Exposing the implementation source itself as a regular npm package that you can import statically with your bundler is unfortunately not easy though, as there is wasm involved (build toolchains, different bundlers handle wasm differently, etc. etc. πŸ˜–).

It would have to come in a separate <script> like the current setup or ESM (cmiiw) like Pagefind does it.

Timosav commented 2 years ago

Great that I mentioned lunr.js πŸ˜„

I guess that exposing an API such as the one you mention is good. User that do no require an excerpt would not need to use it, but those requiring it can access it easily.

In my opinion, this is an acceptable level of complexity, given the information that your search provide. This solution would also fit more use cases.

The only question is: would it make sense for you to attach this logic to the Searcher? If not, we could still provide some utilities or an example of how to display basic excerpts.

Regarding the typed package, I clearly understand that the additionnal dependencies is not attractive. I will try to work on that part if that's ok for you. I never worked on the bundling of packages using wasm, so I'm curious to learn about it.

For the moment, having the Searcher available will be good enough! I will probably use a library such as zod to cast the types of whatever the Searcher is returning.

ang-zeyu commented 2 years ago

Awesome! I'm halfway through decoupling the excerpt highlighting functions from the UI and making them available under results[0].xxx. I think this should be done today in a few hours or tomorrow. (This is released!)

I will try to work on that part if that's ok for you.

That's fine! I revisited this question a few times before but never got to a solution that didn't require bundler-specific setup (e.g. CopyWebpackPlugin for the wasm binaries). The closest I think would be ESM, but there is still the issue of ensuring the wasm binary is co-located with the bundle properly.

It should also be possible to inline the binary as base64 into the bundle directly, but it would be 20-25% larger, and requires WebAssembly.instantiate vs instantiateStreaming. That said I've never benchmarked these penalties in Morsels, it could also be a perfectly acceptable tradeoff if bundler ergonomics / development experience is more critical.

Timosav commented 1 year ago

After some investigation on the packaging, I admit it will take quite a long time have a clean infrastructure.

My current solution to use your library with Next.js is to create a component that will initiate the searcher and share it in a global atom (thanks nanostores).

/**
 * Component that will load the infisearch searcher.
 */
const LoadSearcher = () => {
  const searcher = useStore(searcherAtom);

  if (!searcher) {
    return (
      <Script
        src={getHref('/infisearch/assets/search-ui.ascii.bundle.js')}
        strategy="afterInteractive"
        onLoad={async () => {
          // eslint-disable-next-line @typescript-eslint/ban-ts-comment
          // @ts-ignore
          const newSearcher = new infisearch.Searcher({
            url: getHref('/infisearch/'),
          }) as Searcher;

          await newSearcher.setupPromise;
          searcherAtom.set(newSearcher);
        }}
      />
    );
  }

  return null;
};

Although I hate type casting, the solution is more than sufficient!