Open nhoizey opened 8 months ago
You can for example search for “animal” on https://nicolas-hoizey.photo/search/ and see multiple identical results.
For example, the photo “A storm is coming” is available in 3 different galleries:
They all have the same canonical URL: https://nicolas-hoizey.photo/photos/a-storm-is-coming/ (which I configured for Pagefind, but maybe I shouldn't until it's possible to deduplicate results).
Interesting! I think it would be fine for Pagefind to deduplicate these by default based on their url.
What would you expect regarding the content for these? If you tag three pages with the same url, but they have different content, what should be shown in titles and excerpts? (and what should be indexed for search?) 🤔
@bglw in my specific case, title and content are the same anyway, which feels right because they share the canonical URL.
The only differences are:
So the first item with the URL can be used.
But there might be other use cases where the choice would be different, so maybe this could be a set of values:
deduplicate_contents
false
(default): no deduplicationkeep_first_indexed
: the easiestkeep_earliest
: only possible if contents have dateskeep_latest
: sameconcatenate
: concatenate all content sharing the same URLThere might be other values in the future, which an enumeration easily allows.
came here searching for this, unless I'm missing something the search doesn't seem usable without deduplication. Even if it's possible for the same article to show up once for each tag, it then also shows up a further 3 times under 'tags'? 🤔
tbh I would just expect a list of de-duped matching articles with the tags added as labels on them.
@brokenalarms your screenshot is a different case, as those results aren't direct duplicates, it's just finding a match for your search term in the text of the page that lists everything from a given tag — Pagefind doesn't know that the text happens to point to a different indexed page.
The fix there is to configure the data-pagefind-body
tag to include/exclude the pages that get indexed. (documentation link). By placing that tag only on your articles, the tag listing pages won't be included in the index.
long time coming, but thank you for this! @bglw. Your help led me to discover that it was the limited functionality of astro-pagefind that was blocking my development, since it passes through uiOptions
as a string and so loses the various transformation functions.
I setup pagefind myself in my project and now I can include the advanced features and these tags. Thanks!
It would be great if we could deduplicate results, for sites where the same content can be present on different pages.
This is already something that requires a canonical for SEO (which is allowed with
data-pagefind-meta="url[href]"
), so maybe having a boolean option to use the result URL as a deduplication key could be enough.