firebase / firebase-tools

The Firebase Command Line Tools
MIT License
4.03k stars 943 forks source link

Dynamic robots.txt file 404's on Firebase Hosting #3734

Closed yeldarby closed 10 months ago

yeldarby commented 3 years ago

[REQUIRED] Environment info

firebase-tools: 9.17.0

Platform: macOS 11.5 (20G71)

[REQUIRED] Test case

It's not possible to use a Firebase Function to dynamically generate your robots.txt file; it always 404's. We're trying to dynamically insert the proper Sitemap: directive into our robots.txt file based on some environment variables but the function will not run for this route.

I've created a basic repo to demonstrate the problem here.

This repo rewrites every route (except / which uses the static index.html as expected) to a function called allRoutes that simply prints the route path. It works for every path I've tried except for robots.txt which 404's without hitting the function.

[REQUIRED] Steps to reproduce

Clone the test repo and deploy to Firebase. Try to navigate to robots.txt; it will 404.

I've deployed the repo here for your convenience:

[REQUIRED] Expected behavior

The robots.txt route should run the allRoutes function and print {"hello":"from index.js","path":"/robots.txt"} as it does locally on the emulator.

[REQUIRED] Actual behavior

robots.txt 404's and the function is not run. image

bkendall commented 2 years ago

Yyyyeah. I can replicate this. Huh. Let me dig into this a bit and see what I can learn

bkendall commented 2 years ago

Okay, I've filed an internal bug for this (b/208311208) and am looking more.

Can you confirm for me: I can replicate the issue by calling the Cloud Function directly, like:

curl -i https://us-central1-<project>.cloudfunctions.net/allRoutes/robots.txt

Do you see the same behavior while calling the Cloud Function directly too?

google-oss-bot commented 2 years ago

Hey @yeldarby. We need more information to resolve this issue but there hasn't been an update in 7 weekdays. I'm marking the issue as stale and if there are no new updates in the next 3 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

yeldarby commented 2 years ago

Confirmed

HTTP/2 404 
etag: W/"0-2jmj7l5rSw0yVb/vlWAYkK/YBwk"
function-execution-id: y6tgyxobnbsu
x-cloud-trace-context: fc5f8f389e7a020aa55227c8bb65217e;o=1
date: Wed, 08 Dec 2021 09:41:48 GMT
content-type: text/html
server: Google Frontend
content-length: 0
fr-esco commented 2 years ago

Hi, I have the same issue.

Don't know if it can help, but I'm sharing my experience.

With the following hosting configuration, the cloud function was not invoked at all:

"rewrites": [
  {
    "source": "/robots.txt",
    "function": "robots"
  }
]

Looking into response headers, I noticed a cache hit. So, I added:

"headers": [
  {
    "source": "/robots.txt",
    "headers": [
      {
        "key": "Cache-Control",
        "value": "no-cache"
      }
    ]
  }
]

This way, the function started to be called, but logs said only:

None of my custom log entries showed up, as if the code of my function wasn't actually triggered.

bkendall commented 2 years ago

Sorry for not following up, but here's what I found out: the functions framework that wraps user code in GCF purposefully stops /robots.txt and /favicon.ico from being responded to by GCF. We're working internally to see if that can be changed, but in the meantime I'd suggest either (a) creating a static file /robots.txt to serve the correct content, or (b) migrating to Cloud Run, which doesn't have the same limitation AFAIK (though I do acknowledge that that is a bit more work).

If I get more of an update, I'll try to follow up again. Thanks for raising!

hajarNasr commented 2 years ago

Hi, @bkendall. We're creating a robots.txt API to create the file dynamically but facing this same issue. So I'm wondering if there's an update on it.

bkendall commented 2 years ago

No update as of today, sorry. The really short version of the situation is that GCF didn't design their product with "serve all HTTP requests" in mind - they tend towards specific event providers, even with Firebase's frequent use of HTTP. Their framework explicitly stops robots.txt and favicon.ico from being served, and that's unlikely to change. You may actually have better luck raising an issue in that repo so it can help show user need for it!

Since this isn't something that we can fix in the CLI though, I'm tempted to close this issue. Maybe I'll make a change in the Hosting emulator that will fail or at least print a warning on these paths... at least then it's not a surprise when it doesn't work on production.

galdahan commented 10 months ago

Any update? still facing the same issue :\ What is even strange, I'm using nextjs and locally it works (without using rewrites but robots.txt.ts ssr file inside pages folder) BUT not working in production. Anyway, rewrites still not working.

bkendall commented 10 months ago

Unfortunately, I don't think this is going to be able to change. If you're using Cloud Functions (i.e. you're using the firebase-functions SDK, and either gen 1 or gen 2), you're going to be using the GCP framework that prevents those routes from being served.

The workaround mentioned before of using a static file still apply (since that content is resolved before rewrites), but I don't think we're going to be able to solve this problem here via the CLI.

If these two files being dynamically served via Functions is critical to your workflow, please let us know more by contacting support with a feature request.

harlan-zw commented 9 months ago

This is quite bizarre behavior not to have documented for robots.txt.

I'd propose for this to be re-opened and the emulator to warn when a 404 is thrown for these files, specifically linking to documentation for it.

kevpie commented 7 months ago

I spent a full day trying debug this one. This should be prioritized now with the push for Firebase Webframeworks.

alon6699 commented 6 months ago

Any new progress?

service-paradis commented 5 months ago

I spent a full day trying debug this one. This should be prioritized now with the push for Firebase Webframeworks.

Totally! I'm not sure why this issue is closed? It should, at least, be documented somewhere!

nickjuntilla commented 4 months ago

I am having this same problem because I need to preventing crawling for my dev and staging environment, but not prod. I've decided to go with a moving a robots.dev.txt and a robots.prod.txt file into my build directory during the build phase. Of course this meant having to split my build into 2 different builds, yarn build:dev and yarn build:prod, but that was the only way I could come up with.

I hope you guys allow robots.txt to be redirected to a function in the future.

caweidmann commented 1 month ago

I spent a full day trying debug this one. This should be prioritized now with the push for Firebase Webframeworks.

Totally! I'm not sure why this issue is closed? It should, at least, be documented somewhere!

I thought I was going going crazy... everything else worked except /robots.txt...