adobe / helix-shared

Shared libraries for Project Helix.
Apache License 2.0
12 stars 12 forks source link

Support for string concatenation by the indexer #1006

Open buuhuu opened 1 month ago

buuhuu commented 1 month ago

Is your feature request related to a problem? Please describe. When customers decide to keep .html extensions for their site when moving to edge delivery, they can currently add the extension to the sitemaps https://www.aem.live/developer/sitemap#adding-an-extension-to-all-locations-in-the-sitemap

It is not possible to add the extensions to the canonical. However, we learned that appending the extension clientside may cause indexing issues as it depends on the crawl budget of a site when and how often it is crawled with javascript executed. In the worst case the canonical is not stable and sometimes considered with and sometimes without the extension.

Also the canonical link is considered a stronger signal for the canonical than the sitemap https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls

One workaround for that is to use an index as additional metadata sheet that sets the canonical metadata for each page appending the extension. This is possible with spreadsheets using formulas, but not with BYOM where the index would only be stored as JSON file.

That may be useful for other use cases as well, where content should be concatenated.

Describe the solution you'd like Ideally we could use a binary operation in the value expression to concatenate 2 strings

select: main
value: replaceAll(path + '.html', '/.html', '/')

Describe alternatives you've considered Alternatively we could also support regular expressions using jsep-plugin/regex and do

select: main
value: replaceAll(replaceAll(path, /$/g. '.html'), /\/.html$/g, '/')

Or we support adding extensions to canonicals (and in extend any link) in the html pipeline.

Additional context

https://adobe-dx-support.slack.com/archives/C06FA7MP684/p1727877335760009 https://cq-dev.slack.com/archives/C05QU7MMRNF/p1727126486278739

tripodsan commented 1 month ago

I don't think that adding this extra functionality solved the problem. the canonical will still be wrong in the metadata. it would be better to find a way to correct the metadata, eg by introducing a placeholder language: eg:

canonical: {{url}}.html