Closed dimavedenyapin closed 3 years ago
I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.
We're also experiencing this since August 23rd 6 AM CEST running on Node 14 instances. Same error, across multiple functions on europe-west1
.
We are on Blaze subscription.
We're also experiencing this since August 22rd running on Node 12 instances on asia-south1
Same error across almost all functions. Both onCall and onRequest functions failing.
Tried upgrading to Node 14, and setting minInstances=1 and maxInstances=5. Error still persists.
This is still happening for me. It start at ~10pm GMT+8 and ends after midnight. Initially I thought of migrating to another region, but it seems asia, europe and us are all affected.
Found that it was happening before in July: https://stackoverflow.com/questions/68284263/google-cloud-function-the-request-was-aborted-because-there-was-no-available-ins
Thanks for reporting the issue here. I noticed that there were several GCP support ticket opened w.r.t. to this issue, and will post any relevant updates here for wider audience as the GCF team makes progress on the issue.
Its happening in production We're also experiencing this since August 23rd running on Node 12 instances on asia-south1.
It doesn't have any correlation to the load, I run a hourly pub sub function which also fails frequently. Critical messages are being dropped because of this.
We are also getting this error on europeWest1 There are just 2 people testing the application right now and this already happens. In about a week it will need to scale to thousands of people, is there anything I can do about this?
Same error for me. Functions randomly return The request was aborted because there was no available instance. . Until now, never had this error.
Same issue since ~ 2021-08-26 20:00 BST,
Node 14, firebase-functions 3.14.1 firebase-admin 9.10.0
It is happening with very minor spikes of requests < 100 across all function deployments.
I'd also like to understand why this very impactful issue being reported by many people isn't reflected on https://status.cloud.google.com/ as being investigated.
Edit: Looks like this is already being tracked here https://issuetracker.google.com/issues/194948300
Same here. The requests take way too long. A simple function with a single Firestore write takes up to 6000ms if it isn't aborted. Europe West 1
Same issue since 2021-08-23. Pubsub triggered functions this error message
Same issue, since this morning, and it's getting worse !!! I tried to
I sent a bug report to Firebase...
Same issue.
Hi everyone.
Google Cloud Function (GCF) users as a whole are reporting the same issue described here, and https://issuetracker.google.com/issues/153207649#comment3 is the official response from the GCF team.
tl;dr GCF nodejs runtime used to silently drop requests when instance couldn't be scaled fast enough to respond to demand. Now it's logging the failed request on your project's log, hence the sudden appearance of the issue (release note). For pubsub-triggered functions, this error is usually handled gracefully by automatic retry mechanism in the GCF infrastructure. The same can't be said of HTTP-triggered functions, and the request would have been dropped by the client unless a retry mechanism was already implemented.
To reduce occurrence of the once invisible but now transparent "aborted because there was no available instance" errors, recommendations in https://cloud.google.com/functions/docs/troubleshooting#scalability applies.
I hope this clears up the confusion a bit. I'll leave this ticket open to answer any follow up questions, but since this problem is directly related to Google Cloud Functions and not specific to Firebase Functions, please consider reaching out to GCP support with your project-specific questions.
@taeold But could you clarify what happens to event-driven functions, such as firebase/firestore triggers? The docs here guarantees at-least-once execution.
I'm hoping that is still the case. Only few of our functions are idempotent enough to warrant enabling the retry policy.
@larssn At-least once guarantee applies to all event-driven functions. Are you seeing events from Firebase/Firestore being dropped on your project?
@larssn At-least once guarantee applies to all event-driven functions. Are you seeing events from Firebase/Firestore being dropped on your project?
It's not possible to see what data exactly is affected, due to the nature of the error message. I'm just hoping that our triggers are retried.
If so, then I'm wondering why it's necessary to show it as an error, couldn't an "info"-level message suffice?
@larssn Same thoughts on the error message. I think this is first step on exposing the log to GCF users, and the team is working on a fix to output the log as a warning if it is going to be retried.
We are also seeing this https://github.com/firebase/firebase-functions/issues/965 started US central August 24
it was flooding our logs we had to exclude all such messages from alerts
google Case 28781796
@taeold thanks for your research and updates 👍 I am glad to hear that it still guarantees the At-least once guarantee applies to all event-driven functions.
Looking forward to the fix the log outputs.
I don't really understand how its only a silent vs logging problem. Our app has been running without issue for months, and only NOW are we getting these issues that actually are preventing some functions from running. It feels to me like something actually is going on or has changed, because we have not changed our backend for a while and its been running smoothly until now.
Earlier there were timeout errors but now we are getting these errors.
@npicouet @deepak786 Are these coming from HTTP functions or event-triggered functions? I've been told that these errors are "okay" for event-triggered functions in the sense that Google Cloud Infrastructure will retry the failed functions invocations. So your events aren't actually being dropped and your application is still guaranteed at-least once delivery of messages.
I'm sounding like a parrot, and I'm sorry for doing this over and over again - if you are seeing serious issues, please go ahead and contact Google Cloud Support. I'm not aware of any outage that explains the issues being reported here.
Earlier there were timeout errors but now we are getting these errors.
In this screenshot, all the functions are event-triggered functions.
In my case the function that has that error message is a function triggered by a "firestore trigger" ".onCreate".
This started happening today to me, HTTP Functions
Adding in a log that shows the instance not available, but the retries and eventual successful run of a scheduled cloud function:
Is everyone only seeing the failures with no successful run?
I have a http function and when I litterally call it twice at the same time, it already shows this error. This happens every time. This could never work for thousands of users.
Started happening to us yesterday for HTTP functions as well. Across our environments, this is happening quite often.
save issue:(
Started at 4:30PM EST for me. I'll be in hot water if this persists 😬
I'm confused because the bug tracker says:
"If you have an HTTP triggered function, for this error in particular, in the past you would always receive an error message that was sent back to the calling client but it was not logged within the customer's/user's project"
but the only reason I checked the logs to find this error was because I couldn't understand why all of a sudden our production client app kept getting an error > 50% of the time.
I am also experiencing this issue.
Friendly reminder - please reach out to Google Cloud Support if your HTTPS (not event triggered) functions are having problems.
We are keeping this issue open to make it easy for users experiencing the issue described in the original post to understand what's going on and to make the right next steps. Unfortunately, this issue cannot be fixed by code changes in this repository.
Same here - multiple functions keep getting: "The request was aborted because there was no available instance"
Happening to me too since the 31st of August, but has dramatically increased in the past few hours (all functions seem to be failing now). Almost all of my cloud functions are failing, and most are non-event triggered (http onRequest or onCall).
All running on Node 10, with no maximum instances set.
Have logged an issue with Google here:
https://issuetracker.google.com/u/1/issues/199180393
Quick Fix:
For anyone having this issue, a full redeploy of our cloud functions seems to have resolved the issue for now (errors gone, monitoring service reporting all functions as up/ok). I will monitor and report back if they reappear.
We've had this issue for a while now. It seems to be happening more often lately.
Im having the same issue here. Start about 1 weeks ago, never seen this error in the pass 1 year of using firebase functions.
Only started happening few hours earlier on cloud functions. Re deployed but doesn't seem to be working.
Our cloud functions have been able to scale with no issue for over a year. Nothing has changed within our infrastructure but as of late last night 2021-09-07, our functions have begun to fail. This doesn't appear to be related to traffic, cold starts or long running executions. A request will be made to a function and it will fail on every request for several minutes. It will then begin to work and another function will begin to fail.
There definitely seems to be something larger going on here than just revealing logs to Stackdriver.
My theory is that they reduced tolerance for cold starts. For example: earlier they were ok waiting 2s for cold starting but now they throw the said error in just 1s. If you notice (or can create) a function with lil to no dependency, basically a helloWorld function, will not get affected by this.
This issue also aligns with announcement of min-instance for cloud function. Support recommended using this new beta feature but did not have an answer for the cause of this issue.
So likely they changed some configuration in the backend and this is an effect of that. Lot more people having production impact because of this here: https://issuetracker.google.com/issues/194948300 Hope they find and fix this soon 🤞
Happening to me too. No issues with scaling since we started our project in March 2020 until start of last week, when this issue happens every few minutes
Same issue for us. We haven't made any changes to cloud functions.
Hot off the press - Google Cloud Functions in us-central1 did report some problem ~2021-09-08:
https://status.cloud.google.com/incidents/16SSwVXrYSLjy8fEMvyZ
The status report claims that the issue only affected functions deployed in us-central1 and that it is now resolved. If you are still seeing issues, please contact Google Cloud Support.
I am getting the same issue. 1 year using cloud functions with no problem (europe-west3)
Now this is happening out of nowhere for the past 3 weeks. Contacted firebase support, they tried to tell me to reproduce this problem, however it is impossible to reproduce on demand because it is so random. I explain to them that this happens for all functions, for a few minutes each time (as if it's an outage). Not sure what to do right now :/
@tolypash not exactly the solution you'd like but consider moving out of google's ecosystem. This is a lesson learned hard. So far I have only heard about poor customer support but now I have witnessed it with this issue.
@sanketplus This is not the place for that.
This thread is dedicated to figuring out if there's a problem, and how to resolve it.
@larssn I get what you're saying. OP is saying support is not being helpful and so was the case with me when I was helping someone navigating this issue. I think suggestion/solution of moving out of this is pragmatic. More so when you are having a real customer impact which is making you lose money. You do not want to bet your company and its revenue to a cloud company which is having hard time determining if at all there is a problem.
Anyway, that is my personal take on this. You are welcome to disagree with it :)
@tolypash The problem seemed to go away for 4-5 days (after Google said they found/fixed the issue causing the high frequency of errors) but is now back this morning. That said I've only had 5 errors this morning from approx 5000 calls, and nothing for the past 6 hours (from another 5000 calls) so not as frequent/severe as before. It may have been a transient load issue in the US central region.
I think it might be helpful to read this thread from the Google team:
https://issuetracker.google.com/issues/194948300#comment24 (posted today) https://issuetracker.google.com/issues/194948300 and https://cloud.google.com/functions/docs/concepts/exec#execution_guarantees
My reading is that these errors were occasionally occurring before (primarily due to load spiking), but not being thrown as an error. As noted in the issue tracker thread, event driven functions will retry on error, while http ones don't - it's up to your client to perform retries on errors. At the current error rate I am experiencing I think the retry strategy is reasonable (necessary even since Google state they don't guarantee HTTP executions, and the Internet in general being a relatively unreliable network).
I guess it depends on how many errors you are seeing, and the error ratio compared to successful GCF executions?
im having the same issue on asia-northeast1
I am getting the same issue on asia-northeast1.
All functions failed between 15:00 and 16:00 today.
My Cloud Function max_instances is set to no limit.
This is happening in production firebase environment with a Blaze subscription, I've started seeing the error
The request was aborted because there was no available instance.
since 22nd August 10pm GMT+8. This error happens across all functions when I make 100+ invocations. When the error appears it affects all other functions as well (see screenshot). Can happen with any function withmaxInstances
parameter set or without it.All functions are deployed in
us-central1
Quotas doesn't seem to reach the limit.Related issues
[REQUIRED] Version info
node: v12.22.3
firebase-functions: 3.14.1
firebase-tools: 9.16.0
firebase-admin: 9.11.0
[REQUIRED] Test case
Firebase pubsub listener
[REQUIRED] Steps to reproduce
Send 100+ messages to the firebase pubsub
[REQUIRED] Expected behavior
Functions execute.
[REQUIRED] Actual behavior
Functions failing with a message:
The request was aborted because there was no available instance.
Were you able to successfully deploy your functions?
successfully deployed