firebase / firebase-tools

The Firebase Command Line Tools
MIT License
4.04k stars 951 forks source link

Firebase functions failing to deploy with EXPIRED error #7514

Open curtis-jotson opened 4 months ago

curtis-jotson commented 4 months ago

This is basically the same as #7268 as that issue has been closed as fixed. It is NOT fixed.

[REQUIRED] Environment info

firebase-tools: 13.14.2

Platform:

Try to deploy firebase cloud functions with a large codebase.

[REQUIRED] Steps to reproduce

  1. Enable firebase experiments:enable functionsv2deployoptimizations
  2. Run firebase deploy --only functions with a codebase with a significant amount of functions (30+) with a few large third-party libraries (@google-cloud, hubspot-api) and the build stage of the deployments will fail with EXPIRED errors.

[REQUIRED] Expected behavior

Any function deploys, regardless of size should work without failing.

[REQUIRED] Actual behavior

Function deploys fail unpredictably once you hit a difficult to define, arbitrary limit.

I'm also noticing that the "single builds" feature either is disabled again, or isn't working at all. Before this was an issue, a few months ago, for every firebase deploy --only functions command, we'd have about 4 to 8 builds in the Cloud Build history. After #7268 we'd have the same amount of builds as cloud functions (~30), and they would expire at the queue_ttl value. After the fix with the experiment enabled, we were back down to 4 to 8 builds. Now, we're back up to the pre-fix numbers and issue has resurfaced for us, and has never been solved for others.

RowMoc commented 4 months ago

PLEASE FIX THIS, GOOGLE

Corchoneitor commented 3 months ago

Same problem here. Extremely blocking in our deployment process.

taeold commented 3 months ago

I don't have answers to all issues being raise here, but I've learned that in several support cases I've worked through that many of these relate to limited Cloud Build quotas.

To explain: Some projects have restrictions on the number of simultaneous Cloud Build builds. If your limit is 2 concurrent builds, deploying 10 functions means only 2 can be built at a time, while the others build requests are queue up. Unfortunately, builds queued for over 6 minutes are canceled with an EXPIRED message, which is the error message you are seeing here.

I recommend checking your Cloud Build quota:

  1. Go to Google Cloud Console.
  2. Filter by Service: Cloud Build.
  3. Check "Concurrent builds" and "Concurrent Build CPUs (Regional Public Pool)"

Ideally, the "Concurrent builds" quota should be 10 or more, and "Concurrent Build CPUs" should be 20 or more for your function's region. If they're low, request an increase by clicking "Edit" on the quota.

To discuss a little more on how the Firebase CLI tries to optimize function deployment with functionsv2deployoptimizations flag: Firebase CLI will try to minimize the number of builds required for function deployments by reusing build result of one function for other functions. That is, if you are deploying 10 functions, Firebase CLi may only request 1 Cloud Build build and re-use the result of that build to deploy other 9 functions.

Unfortunately, we can't always share builds of one function for another function depending on a few factors:

  1. Functions require different memory.
  2. Functions are deployed to different regions.
  3. Functions target different gen (1st gen vs. 2nd gen).
  4. Functions are defined in different codebases.

We are exploring options in the Firebase CLI to limit the number of functions deployed simultaneously. This might slow down deployments but increase success rates. Note that with low Cloud Build quotas, there are limits to how much we can speed up deployments.

Let me know if this information helps. Please reach out to Firebase Support if you have further questions.

curtis-jotson commented 3 months ago

@taeold Thank you very much for the actual details into how all this works as well as details that will potentially cause multiple builds. You outlined very well the kind of conditions that will cause the single build feature to split up the builds to optimize reusability (I assume it tries to optimize reusability).

We'll check out our quotas and stuff and not trust the documentation on these things as it says we should be getting 30 concurrent builds by default. This might have changed and not been updated.

I don't know if this information is helpful to try and track down why this is intermittent:

My questions at this point are:

taeold commented 3 months ago

@curtis-jotson

We'll check out our quotas and stuff and not trust the documentation on these things as it says we should be getting 30 concurrent builds by default. This might have changed and not been updated.

Please do! In my experience, the default quota does change depending on various factors (e.g. region where you initialized the function, status of your billing account, etc.)

Are there similar conditions that would cause the single build feature to not enable at all?

There was a time where Google Cloud Functions API temporarily disabled single builds. That was the cause of https://github.com/firebase/firebase-tools/issues/7268. That was considered a bad rollout and was rollback quickly. We don't expect this to happen in the future.

Are there conditions that would cause the single build feature to only work for a handful of functions but the others are built individually?

Does the reusability algorithm group functions for builds by requirements ahead of time? Yes. The code is a bit complicated, but we effectively divide up the functions in separate groups and carefully manage the update/create function API calls.

If so, maybe this is a problem with how we're organizing or specifying our functions in the codebase? Maybe it isn't playing nice with the algorithm sometimes. Possibly! We essentially treat each codebase as a separate deplolment.

E.g. if you have 30 functions defined in a single codebase, then the CLI will try to deploy the function once then deploy 29 functions using the artifacts created in the first deploy.

But if you have 30 functions defined across 30 different codebases, then the CLI will deploy 30 functions all at once since artifacts across codebases are not shareable.

toddmotto commented 3 months ago

Now I'm seeing this issue again, yikes. The only thing I changed was I've added a new function, before that everything was deploying properly these past few weeks. Yet now a new function has caused the same as #7268.

curtis-jotson commented 3 months ago

@taeold Here are the build results of the exact same codebase to first a staging project and then our production project. The experiment is being enabled, there is ZERO difference in code between these two deploys. The only difference is time and target project.

As you can see, the staging deploy runs about ~10 builds for our ~30 functions. But the production deploy ran a build for every single function.

Staging Deploy Build Logs: Staging Deploy Build Logs

Production Deploy Build Logs: Production Deploy Build Logs

taeold commented 3 months ago

@curtis-jotson Thanks for sharing more info!

Honestly, this is a little baffling to me too 🤔 I have no good theory on why your production deploy would initiate large number of builds when using the same config/command to do the deploy?

Would be interested in creating a Firebase Support ticket so I can look into the issue more closely? Having your firebase debug log would also be tremendously helpful.

curtis-jotson commented 3 months ago

@taeold I'll get those logs when I can and put in a ticket. I'll send you the ticket identifier once I've done that.

Unfortunately I'm super busy and we're just working around the issue right now so it might take me a week or two.

Mnumzane commented 4 days ago

Our team is still running into issues related to builds failing with timeout errors. @taeold was progress ever made on this issue?