Closed sjmueller closed 3 years ago
Hey @sjmueller , first of all, thanks for the very detailed explanation and information, it will be very helpful to reproduce the scenario and see what could be happening. I'll dig into it!
On the other hand, yes, the Redis adapter is not compatible with v2 yet, but it is my top priority now, aiming to push the fixes so NebulexRedisAdapter can be compatible with v2 as soon as possible, most likely it will be ready by the end of the next week (maybe before 🤞 ).
I'd suggest two quick tests, 1) change the backend to :ets
, let's discard it is something related to :shards
. 2) Try with :shards
but increasing the partitions, for example, maybe let Nebulex resolve the partitions with the default value System.schedulers_online()
, or just increase it.
BTW, out of curiosity, did you have this issue with the previous version 1.2 or 1.1? to is it something new with v2?
Hi @cabol thanks for the quick response. We switched directly from mnesia to nebulex 2.0.0-rc.2
so there's no comparison agains v1.2. However, what we **just now** decided to try is Nebulex
v1.2.2` with the redis adapter, it's going to production shortly so we can tell you how that goes
Ok we are using an ElastiCache Redis instance in AWS now with Nebulex v1.2.2
and the redis adapter, the results were much better. Under peak load only some calls went above 50ms
, but nothing went over 125ms
. While we'd love if nothing went over 50ms
, these are within acceptable limits for us and certainly much better than what was happening with v2.0.0-rc.0
and the Partitioned
adapter.
Thanks for the feedback! I was checking out the partitioned adapter implementation in v2 and v1.2 and there are no big differences implementation-wise, both use the same Nebulex.RPC
util for distributed tasks, same approach, so I think this situation will be the same regardless of the version, but I'll confirm it anyway. I'll continue with the Redis adapter for v2 and keep u posted!
Hey! I did several benchmark tests with the partitioned cache (it is mostly a first attempt to identify any kind of issues with the partitioned adapter). Using benchee I ran the next test scenarios:
Partitioned cache, 3 nodes (running on local),
:shards
as the backend for the primary store and 16 partitions.put_all
with 10 entries:
Partitioned cache, 3 nodes (running on local),
:shards
as the backend for the primary store and 16 partitions.put_all
with 100 entries:
Partitioned cache, 3 nodes (running on local),
:shards
as the backend for the primary store and 16 partitions.put_all
with 1000 entries:
c:pul_all/2
, what the partitioned adapter does internally is traversing the given entries and group them by node based on the key, and then perform the action on the different nodes. Hence, the larger the number of entries is the longer the latency or execution time will be (the partitioned adapter does an extra logic before to perform the insert itself against the primary store).c:pul_all/2
increases "significantly" depending on the number of entries to store, but even though, with 1000 entries the average latency is still below 20 ms and the max below 50 ms.c:get/2
and c:put_all/2
to the bench tests, because are the ones we are interested in, besides, c:put_all/2
uses c:put_all/3
under-the-hood. However, I also ran the bench test for other functions, and the latencies all below 20 ms.basho_bench
, and yet the latencies were below 20 ms.get
and put_all
, other than that I commented it.I was thinking in your use-case, you have:
result =
Enum.reduce(Domain.Repo.all(members_count_query), %{}, fn item, map ->
Map.put_new(map, "ConversationMembersCount:#{item.conversation_id}", item)
end)
NebulexCache.put_all(result, on_conflict: :override)
NebulexCache.get("ConversationMembers:#{conversation_id}")
Domain.Repo.all(members_count_query)
changes regarding the number of users? For example, how many entries (on average) you are inserting with put_all/2
when you have 10, 100, 500 users, and so on? Overall, what is the num of entries you are trying to insert when you have the 500 users (or more)?@sjmueller any feedback on this? As I explained in my previous comment, I did several bench tests but I couldn't reproduce the issue, maybe if you can give me more details about your scenario (check my questions in the prev comment)?
Hi @cabol, circling back here. It turns out there were some areas where we were caching full serialized objects, and doing so in sequential fashion. Example we might loop through and write 100 user objects to the cache for each api request, and this added up over simultaneous load. For some reason this performed much better with the redis adapter. Furthermore we’ve optimized these scenarios by using redis pipelines (via the nebulex adapter) so things are much more efficient now. Hope this helps.
Absolutely, it helps a lot, thanks for the feedback, I'm glad to hear you were able to sort it out by using the Redis adapter. That is precisely the idea of Nebulex, be able to choose the adapter and topology that fits better with your needs, like in this case. In fact, I remember I ran some benchmark tests with the Redis adapter using a Redis Cluster with 5 nodes and the partitioned adapter with the same nodes connected by Distributed Erlang/Elixir, and I got better results with the Redis one. But anyway, thanks again, this is very helpful because it gives me a better idea of the scenario, I will check and see if maybe we can improve the performance.
Honestly I love what you've built here with nebulex, because it models exactly the way I think about caching, i.e. the ability to annotate functions so that caching does it's job but not at the expense of the original contract. All this with flexibility and no lock-in! We're currently using nebulex in a more manual, centralized fashion but can't wait to set aside time and refactor to the idiomatic approach.
All the work you've done is greatly appreciated 🙏 keep it up!
Great to hear that 😄 ! And of course, there is a long TODO list yet!
Closing this issue for now. Once I have more information about it, if it can be improved somehow, I will create a separate issue for the enhancement.
We have two api nodes in our cluster and have Nebulex v2.0.0-rc.0 setup with
Nebulex.Adapters.Partitioned
. Under regular circumstances, accessing the cache is decently fast, under 50ms. But under semi-heavy load, we had put/delete transactions that are taking 2+ seconds. Originally we thought adding keys to the transactions would help, but the performance continued to be subpar. So we removed the transactions entirely and still have the same problem!Some details about our setup:
Nebulex.Adapers.Partitioned
with this configuration:config :domain, Domain.NebulexCache, primary: [
=> 1 day
]
Under small load of <100 simultaneous users, almost all cache actions execute in <1ms with some outliers up to 20ms which is performance that we would expect
When we have sudden semi-heavy load (e.g. after a mass push notification where 500 people open the app at the same time) the cache gets incredibly slow, resulting in data not returning back to everyone’s app for up to 1 minute (!!!)
You can see how all cache calls start to balloon here to beyond 1s and we've even seen longer, pushing 3-5s and higher even without using transactions:
We have checked CPU utilization on the api nodes, even under the heaviest load the peak is less that 38%
As you can imagine, this is really hampering our ability to scale with our app growth! We have tried to move to a simpler, single node Redis setup that avoids partioning/replication using the official adapter, but
v2.0.0-rc.0
[compatibility has stopped us].(https://github.com/cabol/nebulex_redis_adapter/issues/21) Any help would be appreciated!