firebase / firebase-admin-node

Firebase Admin Node.js SDK
https://firebase.google.com/docs/admin/setup
Apache License 2.0
1.63k stars 370 forks source link

[Firestore] Error adding document: Error: 4 DEADLINE_EXCEEDED: Deadline exceeded after 269.549s,metadata filters: 0.002s,LB pick: 0.002s,remote_addr=142.251.16.95:443 #2655

Open Drpepper78 opened 3 months ago

Drpepper78 commented 3 months ago

[REQUIRED]

I am using firebase admin sdk for this but I have attached the version number for both- "firebase": "10.11.1", "firebase-admin": "12.1.0",

Step 3: I have a serverless webapp which has a mostly modest write rate i.e. less than 3-4k per day. The hourly peak for writes is about 500 and for reads it is about 38k. Each of my vercel function is deployed separately but I am not sure if the DAO setup is shared between all of them or if they all get their own instances.

In dao files, I initialize the admin app and use getFirestore to generate a db client to query.

database = getFirestore(app);

It is weird that these errors report anywhere from 90 seconds to 200 + seconds of delay since our db is fairly small and the number of queries is also low enough that they shouldn't be blocked for this long. Most of these are user driven actions so updates of individual documents or collections will have a few seconds of gaps between them. I am nowhere close to the limits to be facing this.

I am also curious if there is a limitation on the number of connections a single database client can make with firestore db using getFirestore(app) since our connection usage peaks at 10.

Steps to reproduce:

There is no standard pattern to this. I get it regularly but only intermittently for the same queries.

google-oss-bot commented 3 months ago

I found a few problems with this issue:

dconeybe commented 3 months ago

Drive-by comment: If the problematic query contains an "or" clause, it could be related to https://github.com/googleapis/nodejs-firestore/issues/2055, for which a backend fix will be rolling out in August 2024.

Drpepper78 commented 3 months ago

@dconeybe it is mostly happening in write queries. I am pasting a sample here

export const abcdyz = async (
    abcdy: AB,
): Promise<boolean> => {
    const db: FirebaseFirestore = getDBInstance();
    const collection = db.collection('abcdy');
    collection
        .add(abcdy)
        .then(docRef => {
            console.log('Document added with ID:', docRef.id);
        })
        .catch(error => {
            console.error('Error adding document:', error);
            return false;
        });
    return true;
};

While the AB object is probably a few bytes I am not sure it is big enough to cause any issues in writes

milaGGL commented 3 months ago

Hi @Drpepper78 , thank you for reporting this issue. Could you please confirm:

  1. if you have reached Maximum Writes per second
  2. If this is only happening to writes

This could be a similar issue as https://github.com/firebase/firebase-admin-node/issues/2495, which is fixed in newer SDK version. Please try upgrading the SDK and see if the issue persists.

Drpepper78 commented 3 months ago

@milaGGL

  1. I am certain our writes per second are not high enough to impact this. That said, I haven't found any documentation on what the exact figure is. Even the url you linked doesn't actually specify that limit. https://firebase.google.com/docs/firestore/best-practices This is the only place where I have found the lower bound which appears to 500 writes per second. I am sure we aren't eve at 50 writes per second right now.

  2. I don't believe that's the case. We do get some errors in reads as well. Our system is read heavy but writes are more impacted by this than reads. The query impacted the most is a write query which writes a few kb of data into an audit collection. It always makes a new entry. There are no indexes or read queries for this collection.

Additional data point if it helps - this db is in native mode based on google cloud and not datastore mode.

I am speculating here but is there some sort of limit on per document operation? If a document is being written to, can it still be read at the same time on another connection? Are there collection level limitations on writes per second? Also, are there any limitations on the number of connections a client can form? In our console, I can see no more than 10 connections were used at peak. If I am blocked on the number of connections, I can try to spawn more clients.

Based on the issue you linked I will add these env variables

GRPC_VERBOSITY=debug
GRPC_TRACE=all

and check if I am able to get any more insights. Thanks

Screenshot 2024-07-31 at 10 37 28 AM
Drpepper78 commented 3 months ago

Hi @milaGGL another update is that I just noticed in the stack trace of my read API error that the error stack seems to be from a write request invoked elsewhere. This seems to be a vercel issue where logs are assigned incorrectly. I have opened an issue with them but until I get a response we can assume that issue mainly lies with writes and we are well below the write limits.

milaGGL commented 3 months ago

Have you tried upgrading the SDK to the newest version? This could be due to the broken grpc library. https://github.com/firebase/firebase-admin-node/issues/2495#issuecomment-2101667223

Also, is it possible share the stack trace of the error? It might tell us something.

weedeej commented 3 weeks ago

Hi @milaGGL , I am getting same error. We might have the same issue.

Firestore v7.10
Google-Gax v4.3.3

EDIT: Another information that might help solve this case is this only happens not 100% of the time. If I retried with the same request, It goes through without error.

Here's the trace. This might help Drpepper too:

Unhandled Rejection: Error: 4 DEADLINE_EXCEEDED: Deadline exceeded after 266.535s,metadata filters: 0.001s,LB pick: 0.002s,remote_addr=***.***.***.***:443
    at callErrorFromStatus (/var/task/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
    at Object.onReceiveStatus (/var/task/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:193:76)
    at Object.onReceiveStatus (/var/task/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/var/task/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /var/task/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/resolving-call.js:129:78
    at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/var/task/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:161:32)
    at ServiceClientImpl.<anonymous> (/var/task/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
    at /var/task/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:237:29
    at /var/task/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
    at repeat (/var/task/node_modules/google-gax/build/src/normalCalls/retries.js:82:25)
    at /var/task/node_modules/google-gax/build/src/normalCalls/retries.js:125:13
    at OngoingCallPromise.call (/var/task/node_modules/google-gax/build/src/call.js:67:27)
    at NormalApiCaller.call (/var/task/node_modules/google-gax/build/src/normalCalls/normalApiCaller.js:34:19)
    at /var/task/node_modules/google-gax/build/src/createApiCall.js:112:30
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Caused by: Error
    at _firestore._traceUtil.startActiveSpan (/var/task/node_modules/@google-cloud/firestore/build/src/write-batch.js:438:27)
    at DisabledTraceUtil.startActiveSpan (/var/task/node_modules/@google-cloud/firestore/build/src/telemetry/disabled-trace-util.js:16:16)
    at WriteBatch.commit (/var/task/node_modules/@google-cloud/firestore/build/src/write-batch.js:436:43)
    at /var/task/.next/server/app/api/chats/start/route.js:1:5050
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: 4,
  details: 'Deadline exceeded after 266.535s,metadata filters: 0.001s,LB pick: 0.002s,remote_addr=***.***.***.***:443',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} },
  note: 'Exception occurred in retry method that was not classified as transient'
}