Open leighajarett opened 1 month ago
Hey folks, I'm working with @leighajarett on this problem. What we see is that the extension works fine for is for a while, and stops writing most events (our Firestore write volume is pretty constant). When we install a new version of the extension, it works again - until it stops later.
Any idea what could be going on to cause this, or even how we can troubleshoot it?
@puf does this chart represent exports count in BigQuery?
It represents the number of events per day, its a count of the records in the table
Just to add some more information here - we pinpointed a specific event that is missing from the bigquery table.
In the logs, we can see this error
We're wondering if things are timing out somewhere? Maybe from an overload of events?
We (Leigha, myself and our team) have been analyzing a bit further, and these metrics from the Cloud Run task queue associated with one of our extension instances seems pretty conclusive:
In the top chart you can see that:
In the bottom chart you see the size of the task queue, which grows to 500 million, which is presumably its maximum. So... the queue is just not able to process the tasks that the extension is adding to it.
We've just changed the configuration of this queue to have a Max rate
of 500/s (the maximum we can set) to see if that allows it to drain the backlog of tasks, but given the rate at which we're adding tasks that likely won't be enough for long.
We've also upgraded one of our instances of this extension to the new 0.1.56 version, and no longer see the same errors in our logs for that instance.
Five days in, we're still seeing the events being streamed into BigQuery, so 🎉
Steps to reproduce:
Several months ago the extension started randomly stopping streaming records into BigQuery. This seems to be nearly completely stopped until we upgrade the extension to a new version. We don't see any errors in the logs or anything. We have one version of the extension that streams into a non-partitioned table and one that streams into a partitioned table. This only seems to affect the partitioned table.
Expected result
Records continuously stream into BigQuery without interruption.
Actual result
Records are omitted from the BigQuery table until we upgrade the version.