Closed moolen closed 5 years ago
It's awesome that you are looking in to this. I have been thinking that the serverless use case might challenge the design, but have not had the time to think out a solution for this yet.
We could call initialize() outside the lambda handler - but we then might lose metrics from that.
yes, but I thought part of the idea of a lambda-function was that it will be around for some time as long as the function was "hot"? Even for function there will be an initialisation cost I assume. But yes I know the docs states that you should not rely on it.
What we want is to have control over the caching layer and/or the toggle update cycle in order to address this issue.
I think this is part of the solution. I had some idea at some point that all toggles would be stored in a cache, redis, ElastiCache or similar. The the metrics could also be stored in the same cache. Then you could have one scheduled function to updated the cache with new state and push metrics to the unleash-server. (And it's even kind of stupid that the SDK store a local backup to a "file on disk" in a world of immutable docker containers.)
We'll discuss this internally first and i'll come back with a proposal. For now i think it's valueable to have this "limitation" documented here.
Thanks. Always appreciated experience people invest time in finding ways to improve and adapt.
Please tell me if that works for you or when you have input on that topic.
Thanks, this has also triggered my curiosity and I will for sure start discussing this issue with my colleagues as well. They tend to give me good ideas.
it will be around for some time as long as the function was "hot"?
On a cold start, the nodejs process boots and loads some AWS wrapper for messaging, metrics etc. which then in turn loads your consumer-code.
Once the consumer application has done its work the nodejs process is frozen. Literally no CPU instructions are being executed. On the next invocation the process gets woken up and the nodejs runtime uses CPU cycles as usual until the consumer app is done. Here is no overhead of loading either the AWS wrapper nor your application code.
[...] the metrics from the last $metricsInterval interval are not sent
We understand metrics as volatile telemetry data. They have a supporting character and no business decisions should ever be based on metrics.
However, having a robust telemetry delivery, processing and alerting pipeline is important. Here in this case there is a chance to lose all metrics from the last $metricsInterval
before a lambda is being recycled.
For now, we'll set a low $metricsInterval
to minimize the potential loss.
A sustainable and scalable solution would IMO involve (both) a intermediary aggregating relay proxy (1) similar to launchdarkly's ld-relay and (2) a programmable metrics and repository object.
Using (1) we could:
Having (2) would:
TBH, the invest into (1) a relay is pretty darn high for both initial development and maintenance. It really helps at large scale. But i think our scale is not that big yet to really make use of that. However, going for (2) is a low-hanging fruit that solves our immediate needs. Let's keep it LEAN.
I made a PoC to get a feeling of how the repository/metrics option could look like.
What do you think?
However, going for (2) is a low-hanging fruit that solves our immediate needs. Let's keep it LEAN.
Yes please!
I made a PoC to get a feeling of how the repository/metrics option could look like.
Cool, I think it makes sense to be able to swap out the repository.
Side note: The Java SDK has always allowed you to inject your own toggle repository.
Once the consumer application has done its work the nodejs process is frozen
ah, thanks for this detail. This means it would be impossible to keep some kind of state up to date outside the client requests.
Thanks for your Feedback! PR #117 is out. I'll close this issue then.
@moolen How is your hands on experience with this approach? Have you learned something useful?
I have been thinking a bit about the caching strategy. It might be better to add in-memory caching in the unleash-service in /api/client/features
than to introduce a dedicated cache service, such as redis, in between unleash and the client.
This will however not solve the metrics needs.
so far, everything is running smoothly! I've noticed that every single request against /api/client/features
results in a DB query. Sure, adding a cache here is absolutely a viable option to increase the throughput of the api server.
But you know..
There are only two hard things in Computer Science: cache invalidation and naming things. - Phil Karlton
:smiley:
Thanks for sharing. Out of curiosity, what kind workloads (req/s, number of active function) are you running?
It depends on how you categorize and measure it. On the feature-toggle service itself i saw ~45 req/s at peak times without having issues. But usually there are around ~2 req/s.
We have about 70+ lambdas, 40 ECS services per environment (prod/staging etc.), but not all do implement feature toggles. We do have a feature-toggle per environment
@ivarconr I have read the documentation and this PR again and again, but not able to comprehend how injecting own repository can be used for caching especially for serverless use case. Can you please elaborate? Thanks in advance!
Hey @upputa1, the Repository contains the state of available feature toggles.
We implemented the cache like this:
Constructor: read cache file in /tmp/cache.json
. If it exists populate this.data
. If it doesn't make a call to the api server and save the response to the cache file.
Im on vacation right now, i think it makes sense to add a usable repository example to the docs. Ill add it once im back (or feel free to contribute :) )
Appreciate your quick response. I am still not getting the value of this cache as opposed to getting defaults for the flags from environment vars or just plain defaults.
Hi! Im exploring this great project. We use AWS Lambda and Redis. Instead of caching to /tmp/cache.json can i cache this to redis?
If you are using Redis already, you can query redis in your lambda, but have unleash webooks refresh your redis cache. That way you dont experience latency at all.
@upputa1 that seems great, can you point me to an example or documentation about the webooks? Did not found on a quick search on the docs. Thanks!
We use this SDK in a AWS ApiGateway/Lambda context.
AWS recommends that all callbacks/background processes should complete before the function returns [1].
So we should implement it like in this example:
This means, that each invocation has to call
initialize()
in a lambda handler and also callinstance.destroy()
when it completes to properly clean up callbacks and resources. This inevitably issues unnecessary requests to the api server [2] since the user has no access to the internalstorage
implementation and the user has to wait until unleash as a whole is ready.We could call
initialize()
outside the lambda handler - but we then might lose metrics from that instance which is not really what we want. For now, it's a OKish trade-off for us.Example implementation:
This has the following drawbacks:
$metricsInterval
interval are not sentWhat we want is to have control over the caching layer and/or the toggle update cycle in order to address this issue.
We'll discuss this internally first and i'll come back with a proposal. For now i think it's valueable to have this "limitation" documented here.
Please tell me if that works for you or when you have input on that topic.
[1] https://docs.aws.amazon.com/lambda/latest/dg/running-lambda-code.html [2] https://github.com/Unleash/unleash-client-node/blob/master/src/repository.ts#L51