CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.48k stars 113 forks source link

Allow pagefind to run locally #241

Open Takashiidobe opened 1 year ago

Takashiidobe commented 1 year ago

I've been trying to figure out how to run pagefind locally (using the file: protocol) and it works almost correctly.

There's this issue (https://github.com/CloudCannon/pagefind/issues/202) that seems to indicate this is impossible, but it works on my machine (TM):

Screenshot from 2023-03-05 11-56-08

(It requires writing some javascript to figure out whether or not the file is open in a file: or http: context and rewriting itself based on that, and also disabling CORB in your browser, but that's all that's required to make it work).

As an example I have this code on my local files to detect whether its running on a server or locally:

        function appendJs(filepath, serverpath) {
          var protocol = window.location.protocol;
          var script = document.createElement("script");
          if (protocol === 'file:') {
            script.src = filepath;
          } else {
            script.src = serverpath;
          }

          script.type = "text/javascript";
          document.body.appendChild(script);
        }
        appendJs("/home/takashiidobe/monorepo/notes/site/_pagefind/pagefind-ui.js", "/_pagefind/pagefind-ui.js");
       // DOMContentLoaded doesn't work, so I replaced it with 'load'
        window.addEventListener('load', (event) => {
            new PagefindUI({ element: "#search" });
        });

However, the link that pagefind generates is incorrect in a local setting.

Each link is prepended with /file: before the link, like so: /file:/home/takashiidobe/monorepo/notes/site/books/system-design-interview-an-insiders-guide-volume-2/google-maps.html.

This just needs to be either the file link (so /home/takashiidobe/monorepo/notes/site/books/system-design-interview-an-insiders-guide-volume-2/google-maps.html would work) or (file:/home/takashiidobe/monorepo/notes/site/books/system-design-interview-an-insiders-guide-volume-2/google-maps.html) would also work.

However, due to the starting slash, the link doesn't currently work (on chrome or firefox, I checked).

Maybe i'm missing a configuration setting, but I'm not sure.

I've decided to patch it at runtime for now to prove that it's possible, adding a MutationObserver to listen to changes that pagefind generates and removing the starting slash if so.

        const targetNode = document.body;
        const config = { attributes: false, childList: true, subtree: true };
        const callback = (mutationList, observer) => {
          var links = Array.from(document.querySelectorAll('.pagefind-ui__result-link'));
          for (const link of links) {
            link.href = link.href.replace('/file:', 'file:');
          }
        };
        const observer = new MutationObserver(callback);
        observer.observe(targetNode, config);

However, since it's possible to figure out the protocol (the code seems to know when its run on localhost or http), would it be possible patch pagefind to figure out when it's running locally and start the link off with file:/ instead of /file:/?

Thanks for all the work on pagefind, it's worked like a charm and with this fix would replace all of my local search usage.

bglw commented 1 year ago

Hi @Takashiidobe 👋

Thanks for this! It definitely looks like I was too hasty in dismissing this as unachievable — I didn't think it would work quite so smoothly, but I stand corrected 🙂

I'll look into supporting the URLs as described. It won't be a challenge — but I'd want to try get support into my test suite to run file: tests properly, otherwise I'd be worried about a regression. So I'll aim to get that implemented in the test suite first 🙂

Thanks again! Will keep you posted


If you want a different quick fix without a MutationObserver, you could edit your generated pagefind.js file. If you search for /^(\/|https?:\/\/)/ and replace it with /^(\/|(https?|file):\/\/)/ then I think that would be the only patch needed — let me know if you do give that a go, and whether it works out

Takashiidobe commented 1 year ago

Hey @bglw,

The regex replacement you suggested works perfectly.

I used to use mdbook and sphinx for indexing content offline, but they weren't quite right. I love how the index is split into different files, cause that makes it work as well offline as it does when served. I've indexed my books, notes, documentation, papers and more with pagefind, and search queries work effortlessly (other solutions would take down my computer because they'd read the index into memory and the file I/O alone would take up 100% of CPU time and start freezing up my computer. Not pagefind though.

Thanks for all the hard work on pagefind. Solid as a rock and there's nothing else like it i've found :+1:

demosthenez commented 8 months ago

Great. Thank you for sharing. I'm looking to accomplish something similar.