adobe / aio-lib-state

Adobe App Builder State storage library
https://developer.adobe.com/app-builder/docs/guides/application_state/#state
Apache License 2.0
11 stars 12 forks source link

state.get bad performance under cold and warm starts #66

Closed shazron closed 4 months ago

shazron commented 3 years ago

See investigations under #63

Expected Behaviour

Under a cold start, a state.get will take approx less than a second.

Actual Behaviour

Under a cold start, a state.get will take approx 1800ms.

Possible issues

On a warm start, a state.get will still take approx 450ms

The @azure/cosmos promise that is resolved here, takes up 99.9% of the time for a state.get call:

  1. https://github.com/adobe/aio-lib-state/blob/e296e3ecbf5c30ce8597efe15e36bd350e305153/lib/impl/CosmosStateStore.js#L128
  2. https://github.com/Azure/azure-sdk-for-js/blob/1a77eb5fae58a3bb7ced6610dbdf9afe500bab60/sdk/cosmosdb/cosmos/src/client/Container/Container.ts#L109
  3. https://github.com/Azure/azure-sdk-for-js/blob/1a77eb5fae58a3bb7ced6610dbdf9afe500bab60/sdk/cosmosdb/cosmos/src/client/Item/Item.ts#L73

I don't think there can be any more code optimizations possible here since it seems the bottleneck is the CosmosDB read call. The only possible solutions I can see are:

  1. (network) perhaps the data is read from a far away Azure region increasing network latency?
  2. (server) perhaps there is a configuration setting in Azure that will help with CosmosDB NoSQL reads? (partitioning key strategies?)
  3. (client) perhaps there is a more optimal way to use the @azure/cosmos SDK
aiojbot commented 3 years ago

JIRA issue created: https://jira.corp.adobe.com/browse/ACNA-1155

shazron commented 3 years ago

Suggestions from the team:

  1. Re-test with a VPN connection to the US or Europe (the test was from Singapore / India VPN)
  2. Re-test with just the bare @azure/sdk -- for possible inclusion in a bug to be filed with Azure. The perf timings however, already are granular and test the @azure/sdk itself (I modified the @azure/sdk node code to add the timings).
  3. Direct mode for the @azure/sdk for Node.js - https://github.com/Azure/azure-sdk-for-js/issues/4807 this is only available for the Java sdk currently. No ETA for Node.js support -- according to a comment on the linked issue, direct mode support for the Java sdk took them 8 months with 3 devs.

We already have multi-region support (US and Europe) so suggestion 1 could help isolate the issue.

meryllblanchet commented 3 years ago

Thanks for the summary @shazron ! What about re-testing with direct calls to the Azure HTTP API (i.e. not using the @azure/sdk at all)?

shazron commented 3 years ago

What about re-testing with direct calls to the Azure HTTP API (i.e. not using the @azure/sdk at all)?

Good idea. That would help isolate whether there is something else in the SDK that is causing the bottleneck, not the network call itself.

shazron commented 3 years ago

perf_test branch

niksridhar commented 1 year ago

image Getting auth error even though I have access to app builder

shazron commented 4 months ago

Stale, and not valid anymore - v4 of this lib connects to a new State store which will have different behaviour. No changes will be made to the old state store.