Open bglw opened 1 year ago
You may not need to implement exclude_glob
, because you can already exclude matches from a glob pattern, although it's not as simple as specifying which folders to ignore.
I was having the same issue as @ndeville in https://github.com/CloudCannon/pagefind/discussions/127 where my Jekyll _site/tags
folder was causing duplicate results to appear.
I excluded it with this glob pattern: *[!s]/**/*.{html}
which says match any .html in any folder, except those that end with an s
, hence tags
is excluded. But you need to be careful that you don't have other folders ending with s, or these will be ignored too. I use it like this in pagefind.yml
glob: "*[!s]/**/*.{html}"
Before this I was seeing
Total:
Indexed 1 language
Indexed 126 pages
and now I get
Total:
Indexed 1 language
Indexed 116 pages
and my duplicate results have disappeared.
You can check what folders would be indexed if you use the glob in ls
like this:
ls -d _site/*[!s]
and confirm that there's no tags
folder.
Discussed in #127
1. Provide functionality for the `glob` option to take a list of globs 2. Provide an inverse `exclude_glob` option to help setups which need to exclude only a few files
I would certainly appreciate an exclude_glob
so that I can block specified folders/files from being indexed/returned by pagefind. I attempted a couple of ideas from what Jaygooby suggested, none of which worked in my sandbox.
Discussed in https://github.com/CloudCannon/pagefind/discussions/127
glob
option to take a list of globsexclude_glob
option to help setups which need to exclude only a few files