Closed pySilver closed 1 year ago
You can try to use @cached_as()
and explicitly set invalidation factors there:
@cached_as(Merchant, ProductType.objects.filter(status=...PENDING), ...)
def get_mapping_stats():
return Merchant...aggregate()
If that works then issue is somewhere in aggregate handling. If not then it's probably somewhere in setup: redis master/slave, LRU or whatever.
It's weird, however, that direct query outputs a correct result. Are you sure you don't cache and reuse the queryset itself?
P.S. You can check manually what do you have in redis by inspecting queryset._cache_key()
and looks in redis under that one.
P.P.S. Can you also post a value of queryset._cond_dnfs
? It might say whether invalidation factors are inspected correctly.
@Suor I'm not observing the issue at the moment, but it happens often so I'll have some feedback very soon.
In order to get _cond_dnfs
I had to remove aggregation part here are the result:
In [12]: mapping_stats._cond_dnfs
Out[12]: {'merchants_categorymapping': [{}], 'merchants_merchant': [{}]}
But I assume its a different query since it doesn't have group by
part. Is there any way to find _cond_dnfs
& cache key for aggregation query?
What bothers me the most is the fact that direct query returns correct results. It looks like that running process retrieves stale results for some reason and cacheops is not the issue here.
I did little more investigation and found out that in last event when issue was observed there was short master server downtime, so sentinels started routing connections to the follower. My theory at the moment is this:
new master
which is now back to being follower
/manage.py invalidate all
is called (which is practically similar thing)I need to read more about data sync after downtime in redis now.
To get cond dnfs for aggregate you can use the same code as cacheops uses in QueryMixin.aggregate()
:
# Apply all aggregates the same way original .aggregate(), but do not perform sql
kwargs = kwargs.copy()
for arg in args:
kwargs[arg.default_alias] = arg
qs = self._clone()
for (alias, aggregate_expr) in kwargs.items():
qs.query.add_annotation(aggregate_expr, alias, is_summary=True)
print(qs._cond_dnfs)
Do redis really makes the old master again a master and not a slave of the promoted one? Anyway if an invalidation request was lost due to network issues or invalidation being performed on slave (which python still thinks is master due to connection caching) then you get into state you described - stale data. However, the new query hitting cache should also show you stale results on that occasion.
Another thing you might consider that using redis max-memory
might cause stale data unless you say CACHEOPS_LRU = True
as described in README. If you are using max-memory
and didn't set maxmemory-policy volatile-ttl
nor CACHEOPS_LRU = True
then it's probable not simply possible..
I'm on cacheops 5.1 and I'm seeing constant issue with one aggregation query which is not invalidated automatically for some reason (at least its unclear what is the cause):
Settings for this
Merchant
model:"merchants.Merchant": {"ops": "all"},
Other cacheops settings:Cacheops is configured against redis cluster (3 sentinels, 1 master & 1 follower)
uWSGI setup:
gunicorn config.wsgi:application --bind 0.0.0.0:5000 --chdir=/app --worker-class sync --worker-tmp-dir /dev/shm --log-file=- --forwarded-allow-ips="*"
The issue: while in production this query serves outdated aggregation data until TTL from web process. If I run
./manage.py invalidate all
problem goes away. If I ssh to any production app server I can confirm that query serves correct response:I'm totally sure there is no other cache layer.
Summary: data served by web worker process is stale, while direct query outputs correct results. for some reason
invalidate all
solves the issue every time. I'll test.nocache()
to verify if problem is not arising when cacheops is skipped.