NASA-IMPACT / csdap-cumulus

SmallSat Cumulus Deployment
Other
1 stars 1 forks source link

QueueGranules lambda function times out #317

Closed chuckwondo closed 9 months ago

chuckwondo commented 9 months ago

During our attempt to migrate WV03_MSI_L1B (#311) to CBA Prod, ingestion failed after migrating roughly 75% of the entire collection (~1.45M of ~1.9M granules). Generally speaking, the new workflow scalability implementation to allow ingestion of collections in their entirety (rather than year by year), worked well. However, during the course of discovery and queuing of granules, the execution time of the QueueGranules lambda function grew over time, until it eventually reached the 15-minute timeout limit. This occurred after queuing roughly 75% of the granules.

We must do some tuning to prevent this from occurring so that entire collections can be ingested during a single rule execution.

It is difficult to precisely pinpoint the cause of this slow growth in execution time of QueueGranules, but I suspect it is due to the growth of the DB coupled with concurrent access to it.

Here are some factors that may contribute to this:

I suggest we do the following to reduce (ideally, eliminate) the possibility of QueueGranules timing out: