CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.66k stars 120 forks source link

Request: Expose index filters to node wrapper #715

Open marcuswhybrow opened 2 months ago

marcuswhybrow commented 2 months ago

I'm building a web UI using Pagefind and it's going great. I'm using the node wrapper API's addHTMLFile function to populate the index.

I'm currently displaying the entire list of Pagefind filters to the user on page load by awaiting pagefind.filters(). This leads to the filters visually popping in once the promise resolves.

What I'd prefer to do is ask the the node index to report the aggregated filters at compile time, which I could then bake into my HTML to prevent content popping.

marcuswhybrow commented 2 months ago

371 proposes a solution, but in the meantime here's my workaround. I just parse the HTML myself! 😄

import { parse } from "node-html-parser"; // JSDOC was running out of memory for me, and slower.

/** 
 * Parses data-pagefind-filter attributes to exctract all Pagefind filters.
 *
 * Pagefind's node wrapper lib doesn't say which filters it discovered, forcing
 * filter lookup in the client lib, slowing time to first render of filters.
 * This functions gets around that by parsing the HTML again ourselves looking
 * for the same pagefind HTML element attributes which pagefind itself does to 
 * reconstruct the same data that [pagefind.filters()] would return.
 *
 * This is an upcomming feature of Pagefind, so this approach will soon be 
 * obsolete. See reference issues. This implementation is a best guess effort
 * following the Pagefind docs, it may not perfectly match edge cases in filter 
 * names or values.
 *
 * # Reference
 * - https://pagefind.app/docs/filtering/
 * - https://github.com/CloudCannon/pagefind/issues/715
 * - https://github.com/CloudCannon/pagefind/issues/371
 *
 * # Example
 * ```js 
 * import assert from "assert";
 * assert.deepEqual(extractPagefindFilters(`
 *   <span data-pagefind-filter="singleName:inlineContent"></span>
 *   <span data-pagefind-filter="singleName">valueContent</span>
 *   <span data-pagefind-filter="name1, name2:inlineContent">valueContent</span>
 *   <span data-pagefind-filter="name1, name2[data-name], name3:inlineContent" data-name="attrValue">valueContent</span>
 * `), {
 *   singleName: { inlineContent: 1, valueContent: 1 },
 *   name1: { valueContent: 2 },
 *   name2: { inlineContent: 1, attrValue: 1 },
 *   name3: { inlineContent: 1 }
 * }
 * ```
 *
 * @param {string} html 
 * @returns {object} For brevity "object" is substituted for a proper PagefindFilters type.
 */
export function extractPagefindFilters(html) {
  const pagefindFilters = {};

  parse(html).querySelectorAll("[data-pagefind-filter]").forEach(element => {
    let signature = element.getAttribute("data-pagefind-filter");
    if (!signature) return;

    let filters = [];

    let chars = signature.split("");
    let name = "";
    let mod = ""; // the "[attr-name]" or ":inline content" after the filter name, I'm calling a modifier

    chars.forEach(char => {
      switch (char) {
        case ',':
          if (mod[0] === ":") mod += char;
          else {
            filters.push([name, mod]);
            name = ""; mod = "";
          }
          break;
        case '[':
        case ':':
          mod += char;
          break;
        case ']':
        default:
          if (mod) mod += char;
          else name += char;
      }
    });

    if (name || mod) filters.push([name, mod]);

    filters = filters.map(([name, mod]) => {
      name = name.trim();
      mod = mod.trim();
      if (mod[0] === ":") {
        return [name, mod.substring(1).trim()];
      } else if (mod[0] === "[") {
        return [name, element.getAttribute(mod.substring(1, mod.length - 1))?.trim() || ""];
      } else {
        return [name, element.textContent?.trim() || ""];
      }
    });

    filters.forEach(([name, value]) => {
      if (!pagefindFilters.hasOwnProperty(name)) pagefindFilters[name] = {};
      if (!pagefindFilters[name].hasOwnProperty(value))
        pagefindFilters[name][value] = 1;
      else pagefindFilters[name][value]++;
    });
  });

  return pagefindFilters;
}