CloudCannon / pagefind

Static low-bandwidth search at scale
https://pagefind.app
MIT License
3.22k stars 97 forks source link

Be more careful about special-casing `index.html` #604

Closed dscho closed 1 month ago

dscho commented 2 months ago

Pagefind has code to strip a trailing index.html from a URL. The intention is obviously to avoid showing /folder/index.html when /folder/ would already do.

However, the code is a bit lax about checking that index.html is the full file name: It would also strip that suffix for, say, /folder/my_index.html. That, however, leads to a broken link...

I noticed this because I want to convert https://git-scm.com/ to use Pagefind, but any search hitting /docs/git-checkout-index.html would mistakenly link to /docs/git-checkout-.

dscho commented 2 months ago

Hrm. The tests pass for me on Windows... These failures all seem to be variations of this symptom:

      Step failed: \\?\D:\a\pagefind\pagefind\pagefind\features\anchors.feature:58:9
      Captured output: 
      Step panicked. Captured output: called `Result::unwrap()` on an `Err` value: Timeout

However, the output does not suggest that anything timed out?

      Civilization {
          tmp_dir: Some(
              TempDir {
                  path: "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\.tmp6ItJKt",
              },
          ),
          last_command_output: Some(
              CommandOutput {
                  stdout: "\nRunning Pagefind v0.0.0 (Extended)\nRunning from: \"C:\\\\Users\\\\RUNNER~1\\\\AppData\\\\Local\\\\Temp\\\\.tmp6ItJKt\"\nSource:       \"public\"\nOutput:       \"public\\\\pagefind\"\n\n[Walking source directory]\nFound 4 files matching **/*.{html}\n\n[Parsing files]\nFound a data-pagefind-body element on the site.\n↳ Ignoring pages without this tag.\n\n[Reading languages]\nDiscovered 1 language: en\n\n[Building search indexes]\nTotal: \n  Indexed 1 language\n  Indexed 3 pages\n  Indexed 49 words\n  Indexed 0 filters\n  Indexed 0 sorts\n\nFinished in 0.028 seconds\n",
                  stderr: "",
              },
          ),
          browser: None,
          assigned_server_port: Some(
              18145,
          ),
          threads: [
              JoinHandle {
                  id: Id(
                      5,
                  ),
              },
          ],
          handles: [
              ServerHandle {
                  cmd_tx: UnboundedSender {
                      chan: Tx {
                          inner: Chan {
                              tx: Tx {
                                  block_tail: 0x000001a38f51fa10,
                                  tail_position: 1,
                              },
                              semaphore: Semaphore(
                                  1,
                              ),
                              rx_waker: AtomicWaker,
                              tx_count: 1,
                              rx_fields: "...",
                          },
                      },
                  },
              },
          ],
          env_vars: {
              "PAGEFIND_SITE": "public",
          },
      }

I also notice that main fails in the same way. @bglw any ideas?

bglw commented 2 months ago

Yeah that won’t be you — the test suite is a little flaky right now, especially on the windows runners lately. I have a new testing framework on the way which should resolve that.

I’ll give the tests some pokes before I merge, which will be next week as I’m out for the weekend.

(Change looks good by the way, thank you! 🙏)

dscho commented 2 months ago

I’m out for the weekend.

On a Wednesday? CloudCannon must be a nice place to work at, do you have openings? 😁

Yeah that won’t be you — the test suite is a little flaky right now, especially on the windows runners lately. I have a new testing framework on the way which should resolve that.

Great!

bglw commented 2 months ago

Sorry for the delay! My test suite has caught up with me (and/or GitHub is only giving me the runt-of-the-litter machines 😓)

I'll run the tests locally on my mac and merge tomorrow.

bglw commented 1 month ago

@dscho this is now released on 1.1.1-alpha.1 if you're fine using the prerelease for a while :)