Open Hexstream opened 2 years ago
As for the means to disable this feature, one idea would be accepting
strip-html-suffixes=false
as the very first line of the _redirects
file. This is simple, logical, efficient and extensible. More such special directives could eventually be accepted on separate lines only at the start of the file before normal redirection directives.
(edit, 7 april 2024: Now wrangler.toml
should be used instead.)
I was able to work around this issue with Workers, not without some pain.
This solves my issue in the short term, but a simple native solution would be strongly preferred.
Now that Cloudflare Pages supports wrangler.toml
, this would be easier to implement.
wrangler.toml
could just accept strip-html-suffixes = false
to opt-in to this new behavior.
strip-html-suffixes = true
would be the default so as to retain backwards compatibility.
Could you please implement this crucial feature soon?
This is a widely requested feature in the Cloudflare community, here are the first 10 topics I found (in descending order of views):
Pages redirect missing .html extension file name issue Prevent truncating and removal of page name extensions? Cloudflare Pages Route matching problem How to enable trailing ‘.html’ from url Cloudflare Pages Route Matching: Disable extensionless HTML redirect React app deployed to clouldflare, url automatically removes .html Any updates on Route matching vs .html? Automatic redirect pages without .html Cloudflare Pages url redirect Cloudflare Pages truncates URLs by removing the “.html” extension
I realize this is on the roadmap, hopefully we can get this crucial feature in 2024?
It's a big blocker for many people, it will be a great day when this is implemented!
@Hexstream , can you share how you worked around it? Having the same issue as well
Yes, at the time I used 1. a Worker for some easy cases and 2. _worker.js
for some more complex cases.
I am looking to migrate my workarounds to Pages Functions soon.
I am hoping Birthday Week in September will bring nice improvements to Pages, including fixing this critical issue.
Here is my (old but still used) code for 1. the nostrip-html-suffixes
Worker:
wrangler.toml
:
name = "nostrip-html-suffixes"
compatibility_date = "2022-07-22"
main = "./worker.js"
usage_model = "bundled"
workers_dev = false
routes = [
{ pattern = "https://modern.pokehidden.archive.hexstream.net/*", zone_name = "hexstream.net" },
{ pattern = "https://dumping-grounds.hexstream.xyz/archived/ert.hexstream.xyz/FlyHighWithYourDreams*", zone_name = "hexstream.xyz" }]
minify = false
node_compat = false
worker.js
:
export default {
async fetch (request) {
function passthrough (url) {
return fetch(new Request(url), {
method: request.method,
headers: request.headers
});
}
const url = request.url;
if (url.endsWith(".html"))
return passthrough(url.slice(0, -5));
const urlNoSlash = url.endsWith("/") ? url.slice(0, -1) : url;
const htmlResponse = await passthrough(urlNoSlash + ".html");
if (htmlResponse.ok)
//return passthrough(urlNoSlash + ".404");
return Response.redirect(urlNoSlash + ".html", 301);
else
return fetch(request);
}
};
/*
Unwanted effects:
/foo.html redirects to /foo
/foo and /foo/ serve /foo.html
Solution:
/foo.html serves former /foo
/foo and /foo/ do not exist if /foo.html exists
*/
And here is my (old but still used) code for 2. _worker.js
:
export default {
async fetch(request, env) {
function passthrough (url) {
return env.ASSETS.fetch(url.toString(), {
method: request.method,
headers: request.headers,
cf: request.cf
});
}
const url = new URL(request.url);
const pathname = url.pathname;
if (pathname.startsWith('/workaround/')) {
url.pathname = "/workaround/";
return passthrough(url);
}
if (pathname.endsWith('.swf.html')) {
url.pathname = "/workaround/html" + pathname.slice(0, -5);
return passthrough(url);
}
if (pathname.endsWith('.swf')) {
url.pathname = "/workaround/swf" + pathname;
return passthrough(url);
}
if (pathname.endsWith(".html")) {
url.pathname = pathname.slice(0, -5);
return passthrough(url);
}
if (pathname.endsWith("/"))
url.pathname = pathname.slice(0, -1);
const urlHtml = new URL(url);
urlHtml.pathname += ".html";
const urlNotFound = new URL(url);
urlNotFound.pathname += ".404";
const htmlResponse = await passthrough(urlHtml);
if (htmlResponse.ok)
return passthrough(urlNotFound);
else
return env.ASSETS.fetch(request);
}
};
I hope this helps in any way, but I cannot guarantee the quality of the above soon-to-be-legacy code...
Describe the solution
When navigating to a URL such as
https://example.com/foo.html
, Pages for some reason redirects tohttps://example.com/foo
.Unfortunately, this "feature" is simply disastrous for my use-cases, and I urgently need a way to disable it, especially since there does not seem to be any workaround. (This is my only remaining blocker for finishing the long-overdue migration of my 25+ subdomains to Pages.)
First of all, I think it obviously doesn't make sense to force users to change tons of their URLs just because they're changing their underlying infrastructure. Cool URLs don't change, remember? Also, if someone uniformly had as many
foo.html
URLs across all their websites, but only migrated half their websites to Pages, then this would mean that half of their websites would usefoo
and the other would usefoo.html
, leading to gratuitous and confusing inconsistencies.Second, I realize this may be uncommon, but this "feature" interacts catastrophically with websites that already have different pages for
foo.html
andfoo
. In fact, one of my websites does this extensively. I have this nice naming convention (that I really don't want to abandon) where I havefoo.swf
for a flash file and then I havefoo.swf.html
for the embedding page that uses Ruffle for playback. Right now, were I to migrate this website to Pages, I would have to either:move all
foo.swf
to new URLs, andfoo.swf
would now incorrectly servefoo.swf.html
, frustrating users who wanted the flash file.move all
foo.swf.html
to new URLs, andfoo.swf.html
would now incorrectly servefoo.swf
, frustrating users who wanted the embedding page.In short, if this "feature" cannot be disabled easily, then this is a no-win situation for anyone who wants to migrate to Pages and has similarly conflicting URLs.
Third, this "feature" effectively requires (for best semantics and performance) changing the canonical URLs of all
foo.html
files (and any links pointing to them) to the equivalentfoo
URL, which is potentially a HUGE burden of work, depending on infrastructure. As it happens, I have been writing all my websites in raw HTML, CSS and JS for more than a decade, and updating all those damn links manually would be quite a hassle. (And I bet there are many people out there for whom the burden would be exponentially heavier.)Fourth, I already migrated one of my websites with a few
foo.html
files to Pages, meaning I am currently exposing and supporting apparently-canonicalfoo
URLs, something I never even wanted to do in the first place. Once I can go back to supportingfoo.html
only, I would either have to break all links pointing tofoo
URLs, or support redirects forever, both of which are obviously undesirable. In short, anyfoo
URLs that I am temporarily forced to support are a "ticking time bomb", the longer they are exposed, the more users will suffer when I stop supporting them.Fifth, I actually noticed just now that
foo.html
supports not onlyfoo.html
andfoo
, but alsofoo/
! This means that anyfoo.html
file will conflict withfoo/index.html
. It's just crazy to me that allfoo.html
files are automatically and forcibly accessible under 3 different URLs.Anyway, I hope my presentation was not too annoying, but this is a really critical and aggravating issue that I urge you to fix ASAP. I don't even think this "feature" should be enabled by default, but I would be satisfied if it was very easy to disable it. Thank you.