cabol / nebulex

In-memory and distributed caching toolkit for Elixir.
https://hexdocs.pm/nebulex
MIT License
1.25k stars 74 forks source link

Slow cache under moderate simultaneous load #80

Closed sjmueller closed 3 years ago

sjmueller commented 4 years ago

We have two api nodes in our cluster and have Nebulex v2.0.0-rc.0 setup with Nebulex.Adapters.Partitioned. Under regular circumstances, accessing the cache is decently fast, under 50ms. But under semi-heavy load, we had put/delete transactions that are taking 2+ seconds. Originally we thought adding keys to the transactions would help, but the performance continued to be subpar. So we removed the transactions entirely and still have the same problem!

Some details about our setup:

config :domain, Domain.NebulexCache, primary: [

=> 1 day

gc_interval: 86_400_000,
backend: :shards,
partitions: 2

]

- We're using Nebulex in pretty standard circumstances, with commands like:
``` elixir
result =
  Enum.reduce(Domain.Repo.all(members_count_query), %{}, fn item, map ->
    Map.put_new(map, "ConversationMembersCount:#{item.conversation_id}", item)
  end)

NebulexCache.put_all(result, on_conflict: :override)

NebulexCache.get("ConversationMembers:#{conversation_id}")

As you can imagine, this is really hampering our ability to scale with our app growth! We have tried to move to a simpler, single node Redis setup that avoids partioning/replication using the official adapter, but v2.0.0-rc.0 [compatibility has stopped us].(https://github.com/cabol/nebulex_redis_adapter/issues/21) Any help would be appreciated!

cabol commented 4 years ago

Hey @sjmueller , first of all, thanks for the very detailed explanation and information, it will be very helpful to reproduce the scenario and see what could be happening. I'll dig into it!

On the other hand, yes, the Redis adapter is not compatible with v2 yet, but it is my top priority now, aiming to push the fixes so NebulexRedisAdapter can be compatible with v2 as soon as possible, most likely it will be ready by the end of the next week (maybe before 🤞 ).

I'd suggest two quick tests, 1) change the backend to :ets, let's discard it is something related to :shards. 2) Try with :shards but increasing the partitions, for example, maybe let Nebulex resolve the partitions with the default value System.schedulers_online(), or just increase it.

BTW, out of curiosity, did you have this issue with the previous version 1.2 or 1.1? to is it something new with v2?

sjmueller commented 4 years ago

Hi @cabol thanks for the quick response. We switched directly from mnesia to nebulex 2.0.0-rc.2 so there's no comparison agains v1.2. However, what we **just now** decided to try is Nebulexv1.2.2` with the redis adapter, it's going to production shortly so we can tell you how that goes

sjmueller commented 4 years ago

Ok we are using an ElastiCache Redis instance in AWS now with Nebulex v1.2.2 and the redis adapter, the results were much better. Under peak load only some calls went above 50ms, but nothing went over 125ms. While we'd love if nothing went over 50ms, these are within acceptable limits for us and certainly much better than what was happening with v2.0.0-rc.0 and the Partitioned adapter.

cabol commented 4 years ago

Thanks for the feedback! I was checking out the partitioned adapter implementation in v2 and v1.2 and there are no big differences implementation-wise, both use the same Nebulex.RPC util for distributed tasks, same approach, so I think this situation will be the same regardless of the version, but I'll confirm it anyway. I'll continue with the Redis adapter for v2 and keep u posted!

cabol commented 3 years ago

Hey! I did several benchmark tests with the partitioned cache (it is mostly a first attempt to identify any kind of issues with the partitioned adapter). Using benchee I ran the next test scenarios:

Benchee:

Scenario 1

Partitioned cache, 3 nodes (running on local), :shards as the backend for the primary store and 16 partitions. put_all with 10 entries:

PartitionedCache_PutAll_10_entries

Scenario 2

Partitioned cache, 3 nodes (running on local), :shards as the backend for the primary store and 16 partitions. put_all with 100 entries:

PartitionedCache_PutAll_100_entries

Scenario 3

Partitioned cache, 3 nodes (running on local), :shards as the backend for the primary store and 16 partitions. put_all with 1000 entries:

PartitionedCache_PutAll_1000_entries

Insights

About your use-case

I was thinking in your use-case, you have:

result =
  Enum.reduce(Domain.Repo.all(members_count_query), %{}, fn item, map ->
    Map.put_new(map, "ConversationMembersCount:#{item.conversation_id}", item)
  end)

NebulexCache.put_all(result, on_conflict: :override)

NebulexCache.get("ConversationMembers:#{conversation_id}")

Questions

  1. How Domain.Repo.all(members_count_query) changes regarding the number of users? For example, how many entries (on average) you are inserting with put_all/2 when you have 10, 100, 500 users, and so on? Overall, what is the num of entries you are trying to insert when you have the 500 users (or more)?
  2. In terms of size, how that big is each entry? Just to have an idea about how that large could be the total size when performing the bulk insert.
cabol commented 3 years ago

@sjmueller any feedback on this? As I explained in my previous comment, I did several bench tests but I couldn't reproduce the issue, maybe if you can give me more details about your scenario (check my questions in the prev comment)?

sjmueller commented 3 years ago

Hi @cabol, circling back here. It turns out there were some areas where we were caching full serialized objects, and doing so in sequential fashion. Example we might loop through and write 100 user objects to the cache for each api request, and this added up over simultaneous load. For some reason this performed much better with the redis adapter. Furthermore we’ve optimized these scenarios by using redis pipelines (via the nebulex adapter) so things are much more efficient now. Hope this helps.

cabol commented 3 years ago

Absolutely, it helps a lot, thanks for the feedback, I'm glad to hear you were able to sort it out by using the Redis adapter. That is precisely the idea of Nebulex, be able to choose the adapter and topology that fits better with your needs, like in this case. In fact, I remember I ran some benchmark tests with the Redis adapter using a Redis Cluster with 5 nodes and the partitioned adapter with the same nodes connected by Distributed Erlang/Elixir, and I got better results with the Redis one. But anyway, thanks again, this is very helpful because it gives me a better idea of the scenario, I will check and see if maybe we can improve the performance.

sjmueller commented 3 years ago

Honestly I love what you've built here with nebulex, because it models exactly the way I think about caching, i.e. the ability to annotate functions so that caching does it's job but not at the expense of the original contract. All this with flexibility and no lock-in! We're currently using nebulex in a more manual, centralized fashion but can't wait to set aside time and refactor to the idiomatic approach.

All the work you've done is greatly appreciated 🙏 keep it up!

cabol commented 3 years ago

Great to hear that 😄 ! And of course, there is a long TODO list yet!

cabol commented 3 years ago

Closing this issue for now. Once I have more information about it, if it can be improved somehow, I will create a separate issue for the enhancement.