Open nomaddamon opened 8 years ago
A more common example of the problem would be querying a database object with expiry
session.CreateCriteria<Banner>().Add<Banner>(x => x.ExpiresAt > expiryTime).SetCacheable(true).List();
where expiryTime is DateTime.Now rounded to full minute.
Reasoning - every minute we query database for a list of not-expired banners and all other queries hit StandardQueryCache
As a result every minute a new entry is added to NHibernate-Cache:NHibernate.Cache.StandardQueryCache:keys
and they are only removed on Redis FLUSHDB
Went ahead and tried a different approach - not to keep track of keys via :keys but to SCAN and DEL them on ICache.Clear(), see https://github.com/nomaddamon/NHibernate.Caches.Redis
This adds considerable overhead to ICache.Clear() (clearing a small region in a huge database might result in 100x preformance loss) but it does come with a few perks:
This might not be OK approach for this project, but in our use case it works better (ICache.Clear() is not used in our solutions since we try to avoid named queries)
Thanks for bringing this up. This will be quite a long read, but hopefully it helps you.
This is a known disadvantage to using this library (and it's not particularly easy to fix).
The general idea with this project is to be the lowest common denominator for using Redis as a cache for NHibernate. With that in mind, you may have to customize/fork this library to get the best performance with your own application's constraints (as I'm writing this I see you've already stumbled upon one of the solutions :smile:).
Have you profiled to test that adding Redis cache in front of your database is faster than actually just querying the database directly? This obviously depends on how your data is modeled and how you're using NHibernate. Adding cache on top of NHibernate can actually reduce performance because of serialization/deserialization/materialization costs inside of NHibernate. In large applications I've created (and with specific usage patterns/environments), I've found using the DB directly was as fast as using the cache + full round trips. Obviously this highly depends on your DB, environment, data etc.
Do you need a distributed NHibernate cache (e.g. are you using only one web server)?
Have you considered using a non-ORM for high-performance reads (Dapper, Massive etc)? For my applications, we usually start with NHibernate (to get going quickly). As we find performance-sensitive areas, we use raw SQL queries with Dapper to avoid all of the costs associated with NHibernate (and extra round-tripping to the cache). We use MediatR to abstract away the implementation of these queries to consumers.
If you're using the NHibernate query cache, be aware that there are broad cases where an updated will cause NHibernate to call ICache.Clear()
. This, again, depends on how your using NHibernate, but I've found that in situations like this, the NHibernate query cache has no real benefit, so we leave it off completely. We then move the caching "up" (e.g. cache rendered HTML or JSON rather than actual DB entities).
HostingEnvironment.QueueBackgroundWorkItem
is only useful in web scenarios).RedisCache.Prune()
so that you can determine how often you'd want to prune the entries.SSCAN
is a server operation, not a full database operation.Prune()
less general and not a good fit to add to this library (however I'll still consider it if we can nail down a great implementation).RedisNamespace
so you can get direct access to the key formatting (and then write your own Prune()
)?ICache.Clear()
, and don't track the set of active keys.
ICache.Clear()
at different times (e.g. executing native SQL, manually evicting a whole collection/region/type) so you have to be careful about performance.ICache
and bake it into this library. I'd want it to be robust enough. We'd have to take into account that SCAN/KEYS are server (not database) operations.NHibernate-Cache:<region>:<generation>:<key>
ICache
has to sync their generation with the other servers (meaning we have to store the generation in Redis and make sure they all update when the generation changes).
ICache.Clear()
would be as simple as incrementing the generation and letting everyone else know the generation has been incremented (e.g. you could use Redis pub/sub to get notified and also sync your generation on startup).ICache
might have an out-of-date generation (maybe by a few seconds).FLUSHDB
for the specific server/database pair.Users
table cache off to its own dedicated server for even better performance.ICache.Clear()
, memory, the number of commands and CPU.RedisNamespace.GetKey()
). Because the Redis key has duplicate information, it's wasted memory. It would mean that this set of keys no longer has a full list of valid Redis keys, but it's something we can always re-create.Thanks!
You should be able to improve your Clear()
by using the IDatabase.KeyDelete(RedisKey[] keys)
overload instead of doing a batch:
db.KeyDelete(new RedisKey[] { "key1", "key2", "keyN" }, CommandFlags.FireAndForget);
Thanks for long and informative reply! :)
About our setup: we use a bunch of web servers (16 as of writing, but this can change in minutes) load balanced by haproxy. Platform is a mix of WebForms and MVC, ORM is done by NHibernateX (NH for .NET 4.5 with async support). We used Memcached for quite a while successfully, but moved to Redis due to more reasonable failover architecture. The data-structure in our DB is rather complex and some queries (which can be frequent) are really complex. Most of the data is slowly changing, a random entity will probably get a few thousand hits before cache expiry / invalidation. Moving further away from relational data-models is the way forward for scaling, but would mean a huge rewrite and that isn't planned at the moment.
Without caching, we would need to upgrade our relational DB performance by a factor of 10 to keep current page latency, so yes, we need cache (and since we have multiple servers we need a distributed cache)
Moving some queries to Dapper or similar doesn't make much sense, since relational DB is the hardest thing to scale out in a hurry and cost's the most to massively over-provision.
We have been using query cache for a few years with NHibernate and in our case it makes a huge impact on performance. (On a side-note: are you sure about query-cache and ICache.Clear()
- we haven't seen this in our logs and i can't trace it down from NH source)
We do use multiple caching levels, depending on which point the result becomes predictable (and re-usage is likely)
FLUSHDB
locks (i.e. deleting a :keys set with 10m entries) it locks up the whole Redis instance, not just given DB. SHUTDOWN NOSAVE
on an instance can "flush" it fast - but having a Redis instance for each region seems a rather peculiar architecturePlease review this branch: https://github.com/nomaddamon/NHibernate.Caches.Redis/tree/disableIndexSetOnKeys and let me know what to improve in order to get it merged
Regarding ICache.Clear()
, it's not entirely obvious where it can be called, but if you do a "find all references" and work backwards you'll see at least the following places:
AbstractStatementExecutor.CoordinateSharedCacheCleanup()
(used by MultiTableDeleteExecutor
, MultiTableUpdateExecutor
and BasicExecutor
)session.CreateSQLQuery()
):
NativeSQLQueryPlan.CoordinateSharedCacheCleanup()
SessionFactoryImpl.Evict(Type persistentClass)
SessionFactoryImpl.EvictEntity(string entityName)
SessionFactoryImpl.EvictCollection(string roleName, object id)
SessionFactoryImpl.EvictQueries()
SessionFactoryImpl.EvictQueries(string cacheRegion)
Here's my thoughts:
CacheKeyPrefix
or KeyPrefix
).ICache.Clear()
potentially unsafe. I think this is a really good thing to have for your own use cases, however. But, I'm not sure of the value it will bring to the majority of others using this library. Perhaps it's something we can link to directly from the README?Put()
and Clear()
. Clear()
would do a SSCAN on the set of active keys key rather than doing a full scan on the whole keyspace (potentially making this smaller).Also, I'm curious, how are you serializing cache objects (JSON, XML, etc)?
We are at the moment using default serializer. We played around with different options and found it to be acceptable (computing power for serializing is easy to scale - and xml storage overhead in Redis is negligible (bandwidth and latency-wise))
We use CI and can basically do A-B testing (this is a coincidence of our deployment procedure - UpdateTimestampsCache is in one Redis database for all servers, but when deploying, new servers start using a new database to avoid conflicts in data object definitions - this gives us guaranteed non-stale cache, even if some servers are on version X and others are on X+1 (where data structures are different)). Last few weeks we have tested quite a lot in production :) - basically deploying 10-20% of servers with some POC code (that has passed QA) to see how it affects Redis performance. We used scheduled cleanup for a while but it creates quite a lot of load (compared to minimizing ICache.Clear()
usage and not using index set).
Most of ICache.Clear()
usages come from usages we consider bad practices - hardcoding SQL/HQL queries, using named queries etc. (and yes, named query with no result type / sync set will call ICache.Clear()
on all regions). We log all ICache.Clear()
calls as errors...
Please see commit https://github.com/nomaddamon/NHibernate.Caches.Redis/commit/a0df68f246e744d04b7c650bfe5a1314b7d200d4 and run the new tests on your own machine :)
Test logic: Set up a region with ~480k expired keys and ~20k active keys (we wait a bit to make sure Redis cleans up most of expired keys) - 20k active keys expire after the test finishes so we are only testing Get/Put/Clear
Use 5 paralel tasks for testing: 2 tasks simulate cache-miss (GET-PUT) to load-test region 2 tasks simulate cache-miss (GET-PUT) to unrelated region 1 task calls Clear while others are running
You might have to change parameters, depending on you test machine, but with defaults the tests should run at about ~50s combined. There are no Asserts, console output only (the code is not meant to be part of this project and is not formatted according to your style (sorry :) resharper took over at some places))
My local output for Clear_preformance_without_index
Setup took 22156
Test db keyspace: [db15, keys=22593,expires=22593,avg_ttl=3717]
Read-Write started
Read-Write started
Read-Write started
Read-Write started
Clear starting
Clear finished, took 442
Read-Write ended
Read-Write ended
Read-Write ended
Read-Write ended
So... 22.5 unexpired keys (2.5k should expire while test is running) and 442ms clear - ok result if you dont use Clear :)
My local output for Clear_preformance_with_index
Setup took 20831
Test db keyspace: [db15, keys=20002,expires=20001,avg_ttl=816]
Read-Write started
Read-Write started
Read-Write started
Read-Write started
Clear starting
Clear finished, took 11
Read-Write took 474ms on region1
Read-Write took 472ms on region1
Read-Write took 471ms on region2
Read-Write took 475ms on region2
Read-Write ended
Read-Write ended
Read-Write ended
Read-Write ended
Multiplexer is already warmed up so this round there was only 1 keys that would expire while testing.. clear took 11ms (due to FireAndForget) but LOCKED up ENTRIE redis for 470+ms (all regions, all databases)
Play around with parameters a bit and you will be able to get timeout exceptions instead of 400ms read-writes (for me it took 1.5m expired and 50k active keys)
This might not seem that bad, but here we were running at 80 ops per second (at best case) and used a rather small cache... we see 150k ops per second in peak load times... The problem is that if Redis gets locked up, sentinels start failover and the current master gets demoted - meaning that slave will resume from last known good state, which was prior to clear.
So my point is, that you are not adding an option to potentially make ICache.Clear()
unsafe, but that it already is very unsafe under load (under our load calling ICache.Clear()
will result in ~500 requests delayed by 1 second (failover-timeout) while having average page load time of 50ms, ~500 error messages and a lot of work figuring out what happened... not calling ICache.Clear()
will result in Redis DB growth of 1-2GB / day)
Our setup to run your tests was with a local Windows build of Redis that did not have any slaves. We found, obviously, that the non-index ICache.Clear()
with SCAN was slower, but that makes sense because the SCAN is done on the master and not a slave. We added a slave, and saw performance improvements.
I found some variables were inconsistent between tests (e.g. changing the timeout). I wrote the same tests from scratch to fully understand your load tests.
Our results were similar in that calling DEL on the set of active keys can pause other connections for a few hundred milliseconds when it contains lots of values. This makes sense because, as of Redis 3.2, it uses synchronous deletes. Coming in Redis 3.4 will be the UNLINK command which will enable asynchronous deletes. Eventually, we'll be able to include this in our current implementation depending on what your server supports.
Safety (at least strictly for this library) is defined in terms of having atomic operations. We need ICache.Clear()
to be atomic for the general use case of this library. I don't think it's a good idea to have mixed safety (even as a config option because it duplicates the complexity of this library). Our current implementation is slower, but guarantees that ICache.Clear()
is atomic. I'd really like to keep this library as simple as possible so that everyone is free to customize it to their use cases. You've done this perfectly where having lots of keys has proven you need to take the library in a different direction and are able to sacrifice atomicity for performance.
I also wanted to note that having a separate Prune()
that does a SSCAN on the set of active keys and asynchronously removes expired keys from the set will definitely improve performance of ICache.Clear()
since it doesn't have to free up as much memory.
If forking this library is a pain, I've also thought about introducing something like an IRedisCacheStrategy
where you'd be able to customize exactly how the values are stored/retrieved/cleared in Redis. It would take some thought to be generalized, but I'm open to adding that to reduce the pains (compilation/maintenance) of your own fork.
Thanks again for detailed answer with a lot of insight. We have switched to your JSON.NET serializer and saw ~30% memory gain for Redis cluster :)
With some work you could make it the default serializer and offer NetDataContractSerializer as an alternative. It does have some issues at the moment, but nothing major, iv'e added some issues in comments to that file (they are also the reason we didn't start with that serializer)
We are fine with maintaining our own fork until (and if) we can make this project support both current (guaranteed atomicity) and our (high load, many keys, almost no Clear()
) scenarios.
Some thoughts on alternative stategies:
UNLINK
will be a nice addition to Redis - but using it on ICache.Clear()
will not address situations where ICache.Clear()
is not calledPrune()
in our production, but the overhead is too noticeable --
We had constantly ~150k keys in a region with 5 min expiry and ran Prune()
every minute, resulting in ~20k deletions every run. Both "SSCAN+MGET+SREM (for a batch)" and "SSCAN (to end)+SCAN+SREM (more atomic)" will result in noticeable latency increase for unrelated operations and even more to SISMEMBER on the same key. (And then there is the issue to get a single client to run Prune()
in a fault-tolerant way in multi-server environment)I'm not sure why, but GitHub doesn't send email notifications for Gists so I'm glad you mentioned it here. As I'm sure you're aware, I responded.
I've also merged in #22 (and cleaned it up to fit our code base). Thanks very much for your contribution!
Our situation: Table "A" has ~100.000.000 records, and has rather specific load pattern - if a row is queried then it is likely it will be queried again few thousand times over next few minutes. Around 5000 different rows are queried every minute (distribution is near random). After a few minutes given row might not be accessed again for days.
A logical setup would be to set CacheRegion absolute expiry to ~ 5 minutes and keep only "hot" rows in cache (~ 5 * 5.000 = ~ 25.000 rows in cache).
Since SetOfActiveKeysKey items have no expiry, it will build up over time (items are removed from cache via expiry, but :keys records are kept). With our load pattern, this leads to having 10+m records in :keys set, while having ~25.000 items cached. I'm quite certain this has negative effect on lookup speed (not tested though), and certain about memory consumption - purging the :keys will result in few GB of freed memory
At the moment we use scheduled cleanup as a workaround, but this seems like a hack and shouldn't be part of this project? At the same time I can't figure out how to address this issue cleanly - considering current cache architecture.
(The hack we use is to run a schedule every minute that does full SCAN (preferably on slave), finds :keys entries, does SSCAN on each entry, finds (in memory) obsolete entries and deletes them in batches)