firebase / firebase-functions

Firebase SDK for Cloud Functions
https://firebase.google.com/docs/functions/
MIT License
1.03k stars 202 forks source link

Deployed functions fails to read runtime config on Node 10 #630

Closed danielkcz closed 4 years ago

danielkcz commented 4 years ago

Related issues

https://github.com/firebase/firebase-functions/issues/433

[REQUIRED] Version info

node: 10.16.0

firebase-functions: 3.3.0

firebase-tools: 7.15.0

firebase-admin: 8.10.0

[REQUIRED] Test case

I tried to create reproduction in repo https://github.com/FredyC/firebase-functions-repro Unfortunately, it behaves correctly I did not found any difference from my broken project.

[REQUIRED] Steps to reproduce

See above

[REQUIRED] Expected behavior

To have a deployed function that can read a runtime config.

[REQUIRED] Actual behavior

We have like 15 functions deployed in production and they are running smoothly for several months now. They are all Node 10 functions.

Today, I needed to tweak one function a little, I've deployed it (with --only flag) and suddenly that one function is unable to read runtime config, it's always undefined.

In the console log, I've noticed the warning

Warning, FIREBASE_CONFIG and GCLOUD_PROJECT environment variables are missing. Initializing firebase-admin will fail

I assume it's major culprit if this issue, however considering that I have dependencies of the exact same version in a project and in reproduction repo then I don't follow what this warning depends on.

Were you able to successfully deploy your functions?

Functions do deploy, but fail in runtime when reading functions.config().

google-oss-bot commented 4 years ago

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

tilgovi commented 4 years ago

firebase-admin will check for GOOGLE_CLOUD_PROJECT || GCLOUD_PROJECT. I think firebase-functions should probably do the same. Some guidance from GCP about what the most modern option here is would be welcome.

the0rem commented 4 years ago

This is pretty frustrating as we had the staging CI deploy and run fine however the deploy to production doesn't have the config data (because it was deployed after the bug was released).

There's also no way to rollback and fortunately we realised (after a couple hours of debugging) that the issue was specific to using the region() command which meant we could at least re-deploy our functions in favour of https and firebase firebase-function declarations. Our functions are now deployed in a completely different part of the world and the performance is taking a clear hit.

The fact that this issue wasn't caught by your internal CI systems, as well as the fact that there's no way for the end user to recover and rollback from the issue is embarrassing.

Please rectify this with high priority.

danielkcz commented 4 years ago

@tilgovi You mean like either of those should be set locally? Why in that case it's working with fresh barebone project? I certainly haven't set any env variable.

@the0rem The workaround to deploy to default (US) region seems to be working indeed, thank you for that at least.

I am honestly surprised this has got several 👍 overnight, I thought it's some issue in my setup.

wvanderdeijl commented 4 years ago

We are now unable to deploy any function due to Warning, FIREBASE_CONFIG and GCLOUD_PROJECT environment variables are missing. Initializing firebase-admin will fail

The only env vars we see at runtime are: ["NODE_ENV","K_SERVICE","FUNCTION_TARGET","NODE_OPTIONS","FUNCTION_SIGNATURE_TYPE","K_REVISION","HOME","DEBIAN_FRONTEND","PWD","PORT","PATH","NO_UPDATE_NOTIFIER"]

So, there is no FIREBASE_CONFIG nor GCLOUD_PROJECT anymore. Might this be a change in the underlying GCP function infrastructure?

danielkcz commented 4 years ago

Interesting. I just tried to deploy the minimal reproduction again and suddenly it behaves broken too. The warning is visible in logs as well. Yesterday I deployed several times and it was working. Really strange.

https://europe-west1-test-1a2c7.cloudfunctions.net/helloWorld

Running firebase functions:config:get gives me

{
  "hello": {
    "world": "now"
  }
}

I hope that reproduction will help with fixing this soon.

wvanderdeijl commented 4 years ago

We currently have a number of GCP projects that suffer from this issue (in europe-west1) and another set of projects that do not suffer from this. Seems we're in the middle of some sort of rollout that breaks things

mbleigh commented 4 years ago

Hey folks, we see these reports and are investigating. The GCF Node 10 environment does not have GCLOUD_PROJECT or FIREBASE_CONFIG environment variables in the runtime by default, but the Firebase CLI should be inserting them. We'll try to reproduce and go from there.

tilgovi commented 4 years ago

So the issue on my end may be that I'm not using the firebase CLI to deploy my functions.

danielkcz commented 4 years ago

@mbleigh Feel free to use my reproduction repo including that test project, I can transfer it to someone.

pavadeli commented 4 years ago

I think something has changed in the directory structure that is deployed to / inside GCF. In our case the code was unable to find the .runtimeconfig.json that is automatically packaged with the code. We are currently testing the following workaround which seems to fix it in our environment.

import { readFileSync } from 'fs';
import { resolve } from 'path';
try {
    process.env.CLOUD_RUNTIME_CONFIG = process.env.CLOUD_RUNTIME_CONFIG
        || readFileSync(resolve(__dirname, '.runtimeconfig.json'), 'utf8');
} catch (e) {
    console.log('setting CLOUD_RUNTIME_CONFIG failed', e);
}

Note that .runtimeconfig.json is in the root of the deployment, so make sure to get the relative path right (in our case, the file that includes the workaround (our index.ts) happens to be in the root as well).

Also, the workaround needs to be executed before any function is registered and probably before the import of firebase-functions.

wvanderdeijl commented 4 years ago

I can see differences in the ENV VARS and working directory on two GCP projects where I deploy the same function:

project1 (which is working)

ENV VARS { 
  DEBIAN_FRONTEND: 'noninteractive'  
  FIREBASE_CONFIG:  
   '{"projectId":"xxx-wilfred","databaseURL":"https://xxx-wilfred.firebaseio.com","storageBucket":"xxx-wilfred.appspot.com","locationId":"europe-west"}'  
  GCLOUD_PROJECT: 'xxx-wilfred'
  FUNCTION_SIGNATURE_TYPE: 'http'  
  FUNCTION_TARGET: 'addEvent'  
  HOME: '/root'  
  K_REVISION: '40'  
  K_SERVICE: 'addEvent'  
  NO_UPDATE_NOTIFIER: 'true'  
  NODE_ENV: 'production'  
  NODE_OPTIONS: '--max-old-space-size=2048'  
  OLDPWD: '/srv'  
  PATH:  
   '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'  
  PORT: '8080'  
  PWD: '/srv/functions'  
  SHLVL: '1'  
  _: '../start-functions-framework'  
}
CURRENT DIR /srv/functions  
__DIRNAME /srv/functions  
FILES '.runtimeconfig.json'  

project2 (missing env vars)

ENV VARS { 
  DEBIAN_FRONTEND: 'noninteractive'  
  FUNCTION_SIGNATURE_TYPE: 'http'  
  FUNCTION_TARGET: 'addEvent'  
  HOME: '/root'  
  K_REVISION: '239'  
  K_SERVICE: 'addEvent'  
  NO_UPDATE_NOTIFIER: 'true'  
  NODE_ENV: 'production'  
  NODE_OPTIONS: '--max-old-space-size=2048'  
  PATH:  
   '/layers/google.nodejs.npm/npm/node_modules/.bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
  PORT: '8080'  
  PWD: '/srv'  
}  
CURRENT DIR /workspace  
__DIRNAME /workspace  
FILES .runtimeconfig.json

This shows that both containers include the .runtimeconfig.json that is added by the firebase CLI. But the first one appears to have been started by ../start-functions-framework, has a number of additional env vars, a different working directory, and more importantly is working. The second one appears to have some sort of google.nodejs.npm layer in the PATH.

Could it be that different GCP projects are using different "magic" while doing the functions build and deploy?

pavadeli commented 4 years ago

I can now confirm that the workaround above works for us.

wvanderdeijl commented 4 years ago

src/config.ts reads the .runtimeconfig.json from a fixed path ../../../.runtimeconfig.json. Perhaps this is incompatible with the node_modules living in /layers/google.nodejs.npm/npm/node_modules/ as in the second project in the example above. There have been numerous issues in this git repo about the fixed path ../../../.runtimeconfig.json only working if both the .runtimeconfig.json and node_modules are in the root of the project. Perhaps this is broken by the "google nodejs npm" layer trick being used in the second project

laurenzlong commented 4 years ago

Thanks for the report everyone, for now, please revert to using Node 8. We will continue to investigate and make a fix to Node 10.

To revert to Node 8, edit the "engines" field in package.json:

  "engines": {
    "node": "8"
  }
otri commented 4 years ago

Yep, this broke for us too. Asia region. Node 10.

cuongnvicts commented 4 years ago

Yes, we are facing it too. And now revert to Node 8, it's temporary ok

mattgstevens commented 4 years ago

can confirm we have this issue as well, in region europe-west1.

for the time being, the workaround fixes the problem https://github.com/firebase/firebase-functions/issues/630#issuecomment-600259931

olivierlevy commented 4 years ago

I confirm that too for region europe-west1, make sure you don't have some specific node 10 like '.finally(' if you revert to node 8

danielkcz commented 4 years ago

Yea, I do use .finally quiet a lot so that's not an option for me. For now, I will use the us-central region as performance is not that important for those functions.

It's really strange how the GCP setup is so different across regions, I would assume it's a same thing just running on a different physical machine.

mbleigh commented 4 years ago

I would not count on regional differences as a workaround for this problem right now. This bug may be related to a GCF infrastructure rollout (which generally happens slowly region-by-region) and we can't guarantee that the problem won't start occurring in us-central as well.

Please test carefully while we are still investigating before deploying functions that might impact your production environments.

jassmith commented 4 years ago

We're also experiencing this issue

wvanderdeijl commented 4 years ago

We deploy the same application to 5 or 6 separate projects in europe-west1. Only one of them is suffering from this issue. This confirms the suspicion for a phased rollout of some new infrastructure. It seems like it hasn’t even reached all our GCP projects

jassmith commented 4 years ago

We also have multiple applications in us-central1, only 1 is impacted (unfortunately our most important one).

laurenzlong commented 4 years ago

Hey everyone, thanks for the patience. We've discovered the root of the problem, and it's due to a rollout of build packs for the Node 10 runtime. (That's why only some functions experience the issue, and not others) We've asked the team to start ramping down the rollout, so in the next few hours, the feature will be fully ramped down. The SDK fix has been made in https://github.com/firebase/firebase-functions/pull/634. In the meanwhile, the quickest way to get your functions to work again is to revert to Node 8 (see my comment above). Thanks again for bearing with us!

zerobytes commented 4 years ago

This issue is affecting a project which is very important. Setting node version to 8 creates a different error now. The very unclear connection error. My plan is Blaze, and the code is clean, although i can't figure why. This needs to be fixed ASAP. As a CTO of a medium size business, this will affect many of our clients and the time it is taking to solve show's Google Lack of care with the customers.

Google products are awesome, easy and very practical, but when it comes to customer care it is always the same in all the cases.

I realy hope this gets fixed soon because today we have 5 products based on Firebase Functions/Firestore which the CEO is questioning due to problems like this.

danielkcz commented 4 years ago

@zerobytes Have you actually bothered to read the previous comment that clearly states the problem has been discovered and will be rolled out soon? Stop flashing your CTO badge and use your brains, please.

hdp617 commented 4 years ago

The experiment should be fully ramped down now.

zerobytes commented 4 years ago

@FredyC I hope you're not into management of anything, as in your opinion i should not use my customer status to get at least the minimum i deserve.

If you're happy with the 3 days it took to solve the issue, that's up to you, but me and my business as many others who certainly complained through other channels do not take those 3 days as something even near to the expected response to a business solution provider, such as Firebase.

Now, to the dev team, who's faraway from being held responsible for the bad customer care, i appreciate the effort to solve the issue and thank for being the only part of google who actually shows some respect with customers.

laurenzlong commented 4 years ago

Hi everyone, thanks again for the patience. The fix has been released in v3.4.0. Please update your package.json to point to this latest version, so that when build packs get re-rolled out, your functions will continue to work.

danielkcz commented 4 years ago

@laurenzlong Just to clarify, I assume we need to redeploy affected functions, it won't start working on its own, right?

mbleigh commented 4 years ago

Correct, you will need to redeploy with firebase-functions >= 3.4.0

jassmith commented 4 years ago

firebase-functions 3.4.0 is busted with typescript. It appears to have been built against a different version of express than its dependencies call for and you get type mismatches during compile.

laurenzlong commented 4 years ago

@jassmith I'm not able to reproduce this issue. Can you be more specific, where you seeing these errors? Is it when you run "tsc"?

jassmith commented 4 years ago

It appears to be that we are seeing firebase-functions installing @types/express 4.17.2 which produces errors, but 4.17.3 works correctly. It is happening during tsc yes.

laurenzlong commented 4 years ago

I think you have a package-lock.json that's causing @types/express 4.17.2 to be installed. Our package.json has "@types/express": "^4.17.0" which will pick up 4.17.3

jassmith commented 4 years ago

right but this breaks upgrades. On a clean project this is true, but you're effectively incompatible with <= 4.17.2 for the types. Why not express that correctly as "@types/express": "^4.17.3"?

laurenzlong commented 4 years ago

@jassmith this has been fixed in v3.5.0. Thanks for reporting!

jassmith commented 4 years ago

Thank you!