Individual Firestore get() calls exceeding one second

Operating System version: Google App Engine (Windows 10.0.19045 in dev)
Firebase SDK version: firebase-admin@11.9.0
Firebase Product: Firestore
Node.js version: 16 (16.17.0 in dev)
NPM version: 9.5.1

I use Firestore as my database for a simple backend server, and very slow reads are causing my API requests to take several seconds to complete.

My backend is an always-running GraphQL server running in Google App Engine, so it is not suffering from cold-start issues. The GAE instance and the selected Firestore instance are both running in US-central (nam5), so transfer delay should be negligible. Document sizes are 2-10KB, so large transfers should not be an issue. All other external REST calls complete in 50-120ms, so I don't believe it's a network saturation issue. My keys are generated UUIDs, so I don't believe it's a hotspotting issue.

My typical API processing makes 1-5 calls in parallel to documents to determine what the user is requesting, then gets 20-80 documents in parallel to build the response - i.e. a single waterfall. Using @opentelemetry/instrumentation-grpc@0.27.0, I am tracing these calls from within the server (and batching the traces). Typically the first batch of reads completes in 300-700ms, and the second batch takes 600-1200ms. This far exceeds what I would consider viable performance.

For the user to not be interrupted, I try to keep my P50 performance <300ms. With these numbers, I'm not even close. API calls that take 5+ seconds are not uncommon.

I feel like I've eliminated every possible variable, and all that is left is: a) an issue with the node package, perhaps straining under 20+ simultaneous reads, or b) misrepresented data coming from the instrumentation-grpc package, or c) mismatched expectations of Firestore's capabilities. Unfortunately, it seems Firestore doesn't publish any SLAs or expectations. Is a single Firestore read taking 300ms expected, or an outlier needing investigation? I do see some reads happening in <40ms, which is far closer to my expectation.

Here's one of the simplest APIs I have, and it completes in 568ms. There are 56 reads across 6 batches. Shortest read time is 23ms (the first one), longest is 307ms. Most reads are >180ms.

Here's a taste of a more complex API that has 41 reads, and completes in 3298ms:

For reference, here's an API call that doesn't call Firestore. It completes in 9ms.

From these traces, it seems that the read times scales linearly with the number of reads in flight. Information online suggests that the firebase-admin package uses a GRPC pool under the hood, so I would expect the read time to be static until the GRPC connection pool is exhausted, then scale linearly after that point. Is there a way to configure or inspect what's happening with the GRPC connection pool?

Here's the code used to initialize the package.

firestore.ts

import {initializeApp} from 'firebase-admin/app';
import {CollectionReference, DocumentReference, FieldPath, getFirestore} from 'firebase-admin/firestore';

initializeApp();

export const db = getFirestore();
db.settings({ignoreUndefinedProperties: true, maxIdleChannels: 500});

TL;DR: How do I get my read times lower, or is this as good as I can expect them to be?

I found a few problems with this issue:

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.
This issue does not seem to follow the issue template. Make sure you provide all the required information.

After further triage, I believe the underlying issue is the amount of time the firebase-admin SDK spends preparing sends and processing received data. The console shows that the P50 Firestore read time is ~90ms, which does not line up with the tracing shown above. I profiled a single API request on my development machine, which shows similar latencies to the GAE instances. For a single API invocation with a total latency of 1781ms, the server spends at least 684ms in Firebase code, and at least another 447ms waiting for IO, of which Firestore is the only source. For comparison, the aggregate of handling and responding to the client request, all GraphQL processing, and the entirety of user code, is 201ms.

You can see in the flamegraph that the second waterfall of requests takes ~300ms from start of processing to the end, which lines up pretty well with what I see in the tracing from the first post, and the observation of ~100ms read times from Firestore.

Annotated flamegraph of the above findings.

I verified that the API is CPU-bound by running tests on GAE with 600MHz and 1.2GHz processors. The faster instance responded to API requests almost exactly twice as fast.

Due to this being a CPU limitation, rather than waiting on I/O, the NodeJS server cannot handle multiple requests well. This is limiting the server to approximately 1 RPS - which I think we can all agree, is quite low. My use case is requests coming in infrequent, sporadic bursts of 3-6RPS. Too short for responsive scaling to make any difference.

This all being said: Is there anything I can do to drastically cut down on processing time? Is this due to the high number of concurrent requests, or is this inherent to the library? Is there configuration I'm missing?

This is for anyone facing this issue as well: I've done all the optimizations I think I can do. I've managed to get API calls down to a P50 of 800ms. I've done that through two major optimizations:

Converting as many individual 'get' calls into batched 'getAll' calls. Note that there is a limit of 30 items per getAll call. Through this approach, calls that had 30-60 individual Firestore requests can now be accomplished in 4-8 batched calls. This reduces CPU load to about 1/3 of what was observed, saving approximately 250ms per API call. Round trip time (RTT) for the batched data is about 350ms.
Preloading data wherever possible. When the API is invoked, it pulls any cached data that it might need later in the call. The sizable RTT for the data means the extra reads are worth it to save further waterfalling.

The conclusions I've reached based on the above are:

firebase-admin on NodeJS is simply too slow for any user-facing application. It was fast enough in initial tests with only one or two reads per invocation. It cannot be used as a realtime database. Find another solution.
Firestore is incompatible with GraphQL. GraphQL's philosophy of deferred processing is fundamentally opposed to the realities of this library and the underlying technology. By loading data based on what the programmer thinks is necessary, and not what the graph requires, is antithetical to the technology. Do not try to mix them.

firebase / firebase-admin-node

Individual Firestore get() calls exceeding one second #2357