cloudflare / workers-sdk

⛅️ Home to Wrangler, the CLI for Cloudflare Workers®
https://developers.cloudflare.com/workers/
Apache License 2.0
2.7k stars 710 forks source link

🚀 Feature Request: Urgently need a way to disable redirects from `foo.html` to `foo` with Pages #1488

Open Hexstream opened 2 years ago

Hexstream commented 2 years ago

Describe the solution

When navigating to a URL such as https://example.com/foo.html, Pages for some reason redirects to https://example.com/foo.

Unfortunately, this "feature" is simply disastrous for my use-cases, and I urgently need a way to disable it, especially since there does not seem to be any workaround. (This is my only remaining blocker for finishing the long-overdue migration of my 25+ subdomains to Pages.)

First of all, I think it obviously doesn't make sense to force users to change tons of their URLs just because they're changing their underlying infrastructure. Cool URLs don't change, remember? Also, if someone uniformly had as many foo.html URLs across all their websites, but only migrated half their websites to Pages, then this would mean that half of their websites would use foo and the other would use foo.html, leading to gratuitous and confusing inconsistencies.

Second, I realize this may be uncommon, but this "feature" interacts catastrophically with websites that already have different pages for foo.html and foo. In fact, one of my websites does this extensively. I have this nice naming convention (that I really don't want to abandon) where I have foo.swf for a flash file and then I have foo.swf.html for the embedding page that uses Ruffle for playback. Right now, were I to migrate this website to Pages, I would have to either:

  1. move all foo.swf to new URLs, and foo.swf would now incorrectly serve foo.swf.html, frustrating users who wanted the flash file.

  2. move all foo.swf.html to new URLs, and foo.swf.html would now incorrectly serve foo.swf, frustrating users who wanted the embedding page.

In short, if this "feature" cannot be disabled easily, then this is a no-win situation for anyone who wants to migrate to Pages and has similarly conflicting URLs.

Third, this "feature" effectively requires (for best semantics and performance) changing the canonical URLs of all foo.html files (and any links pointing to them) to the equivalent foo URL, which is potentially a HUGE burden of work, depending on infrastructure. As it happens, I have been writing all my websites in raw HTML, CSS and JS for more than a decade, and updating all those damn links manually would be quite a hassle. (And I bet there are many people out there for whom the burden would be exponentially heavier.)

Fourth, I already migrated one of my websites with a few foo.html files to Pages, meaning I am currently exposing and supporting apparently-canonical foo URLs, something I never even wanted to do in the first place. Once I can go back to supporting foo.html only, I would either have to break all links pointing to foo URLs, or support redirects forever, both of which are obviously undesirable. In short, any foo URLs that I am temporarily forced to support are a "ticking time bomb", the longer they are exposed, the more users will suffer when I stop supporting them.

Fifth, I actually noticed just now that foo.html supports not only foo.html and foo, but also foo/! This means that any foo.html file will conflict with foo/index.html. It's just crazy to me that all foo.html files are automatically and forcibly accessible under 3 different URLs.

Anyway, I hope my presentation was not too annoying, but this is a really critical and aggravating issue that I urge you to fix ASAP. I don't even think this "feature" should be enabled by default, but I would be satisfied if it was very easy to disable it. Thank you.

Hexstream commented 2 years ago

As for the means to disable this feature, one idea would be accepting

strip-html-suffixes=false

as the very first line of the _redirects file. This is simple, logical, efficient and extensible. More such special directives could eventually be accepted on separate lines only at the start of the file before normal redirection directives.

(edit, 7 april 2024: Now wrangler.toml should be used instead.)

Hexstream commented 2 years ago

I was able to work around this issue with Workers, not without some pain.

This solves my issue in the short term, but a simple native solution would be strongly preferred.

Hexstream commented 7 months ago

Now that Cloudflare Pages supports wrangler.toml, this would be easier to implement.

wrangler.toml could just accept strip-html-suffixes = false to opt-in to this new behavior.

strip-html-suffixes = true would be the default so as to retain backwards compatibility.

Could you please implement this crucial feature soon?

Hexstream commented 7 months ago

This is a widely requested feature in the Cloudflare community, here are the first 10 topics I found (in descending order of views):

Pages redirect missing .html extension file name issue Prevent truncating and removal of page name extensions? Cloudflare Pages Route matching problem How to enable trailing ‘.html’ from url Cloudflare Pages Route Matching: Disable extensionless HTML redirect React app deployed to clouldflare, url automatically removes .html Any updates on Route matching vs .html? Automatic redirect pages without .html Cloudflare Pages url redirect Cloudflare Pages truncates URLs by removing the “.html” extension

I realize this is on the roadmap, hopefully we can get this crucial feature in 2024?

It's a big blocker for many people, it will be a great day when this is implemented!

dc-95 commented 5 months ago

@Hexstream , can you share how you worked around it? Having the same issue as well

Hexstream commented 5 months ago

Yes, at the time I used 1. a Worker for some easy cases and 2. _worker.js for some more complex cases.

I am looking to migrate my workarounds to Pages Functions soon.

I am hoping Birthday Week in September will bring nice improvements to Pages, including fixing this critical issue.

Here is my (old but still used) code for 1. the nostrip-html-suffixes Worker:

wrangler.toml:

name = "nostrip-html-suffixes"
compatibility_date = "2022-07-22"

main = "./worker.js"
usage_model = "bundled"

workers_dev = false
routes = [
       { pattern = "https://modern.pokehidden.archive.hexstream.net/*", zone_name = "hexstream.net" },
       { pattern = "https://dumping-grounds.hexstream.xyz/archived/ert.hexstream.xyz/FlyHighWithYourDreams*", zone_name = "hexstream.xyz" }]

minify = false
node_compat = false

worker.js:

export default {
  async fetch (request) {
      function passthrough (url) {
          return fetch(new Request(url), {
              method: request.method,
              headers: request.headers
          });
      }
      const url = request.url;
      if (url.endsWith(".html"))
          return passthrough(url.slice(0, -5));
      const urlNoSlash = url.endsWith("/") ? url.slice(0, -1) : url;
      const htmlResponse = await passthrough(urlNoSlash + ".html");
      if (htmlResponse.ok)
          //return passthrough(urlNoSlash + ".404");
          return Response.redirect(urlNoSlash + ".html", 301);
      else
          return fetch(request);
  }
};

/*

Unwanted effects:

/foo.html redirects to /foo
/foo and /foo/ serve /foo.html

Solution:

/foo.html serves former /foo
/foo and /foo/ do not exist if /foo.html exists

*/

And here is my (old but still used) code for 2. _worker.js:

export default {
    async fetch(request, env) {
        function passthrough (url) {
            return env.ASSETS.fetch(url.toString(), {
                method: request.method,
                headers: request.headers,
                cf: request.cf
            });
        }
        const url = new URL(request.url);
        const pathname = url.pathname;
        if (pathname.startsWith('/workaround/')) {
            url.pathname = "/workaround/";
            return passthrough(url);
        }
        if (pathname.endsWith('.swf.html')) {
            url.pathname = "/workaround/html" + pathname.slice(0, -5);
            return passthrough(url);
        }
        if (pathname.endsWith('.swf')) {
            url.pathname = "/workaround/swf" + pathname;
            return passthrough(url);
        }
        if (pathname.endsWith(".html")) {
            url.pathname = pathname.slice(0, -5);
            return passthrough(url);
        }
        if (pathname.endsWith("/"))
            url.pathname = pathname.slice(0, -1);
        const urlHtml = new URL(url);
        urlHtml.pathname += ".html";
        const urlNotFound = new URL(url);
        urlNotFound.pathname += ".404";
        const htmlResponse = await passthrough(urlHtml);
        if (htmlResponse.ok)
            return passthrough(urlNotFound);
        else
            return env.ASSETS.fetch(request);
    }
};

I hope this helps in any way, but I cannot guarantee the quality of the above soon-to-be-legacy code...