Event count/tally does not work post 7.0.X upgrade

adepue commented 9 years ago

We recently upgraded from 6.4.4 to 7.0.1 (and now 7.0.2), and event counts all remain at 1 for us.

We are using the django node storage backend currently, memcache for caching and redis for everything else.

At this point, event counts all remain at 1 permanently. Separately all historical graphs/etc stick at "loading historical data". I know hte migration woudlnt have moved data forward, but it doesnt seem like anything new is being populated. Unsure where to start debugging as we arent getting any error logs or anything.

dcramer commented 9 years ago

There are a couple of possibilities:

Buffers aren't correctly configured. I dont believe the syntax changed so it seems unlikely, but take a look
TSDB isn't configured unless you were using a generic setting before.

What I'd recommend is generating a new configuration (sentry init) and then applying your changes on top of it. We improved generation quite a bit

(We also completely failed at noting this in the changes)

rogerhu commented 9 years ago

Looks like a Redis issue:

Traceback (most recent call last):
  File "/home/rhu/.virtualenvs/dev/local/lib/python2.7/site-packages/sentry-7.0.2-py2.7.egg/sentry/utils/safe.py", line 26, in safe_execute
    result = func(*args, **kwargs)
  File "/home/rhu/.virtualenvs/dev/local/lib/python2.7/site-packages/sentry-7.0.2-py2.7.egg/sentry/event_manager.py", line 443, in _save_aggregate
    is_regression = self._process_existing_aggregate(group, event, kwargs)
  File "/home/rhu/.virtualenvs/dev/local/lib/python2.7/site-packages/sentry-7.0.2-py2.7.egg/sentry/event_manager.py", line 503, in _process_existing_aggregate
    }, extra)
  File "/home/rhu/.virtualenvs/dev/local/lib/python2.7/site-packages/sentry-7.0.2-py2.7.egg/sentry/buffer/redis.py", line 87, in incr
    pipe.execute()
  File "build/bdist.linux-x86_64/egg/redis/client.py", line 2578, in execute
    return execute(conn, stack, raise_on_error)
  File "build/bdist.linux-x86_64/egg/redis/client.py", line 2492, in _execute_transaction
    self.raise_first_error(commands, response)
  File "build/bdist.linux-x86_64/egg/redis/client.py", line 2526, in raise_first_error
    raise r
ResponseError: Command # 4 (GET b:k:sentry.group:eefceaa1a84e4a945af06f918a4c2936) of pipeline caused error: WRONGTYPE Operation against a key holding the wrong kind of value

dcramer commented 9 years ago

@rogerhu can you check what version of Redis?

adepue commented 9 years ago

Redis server is 2.8.6. (@rogerhu and I are working on this together)

rogerhu commented 9 years ago

>>> import redis
>>> redis
<module 'redis' from '/home/rhu/.virtualenvs/dev/local/lib/python2.7/site-packages/redis-2.10.3-py2.7.egg/redis/__init__.pyc'>
>>> redis.__version__
'2.10.3'

rogerhu commented 9 years ago

We are using Amazon's ElasticCache -- http://aws.amazon.com/elasticache/

adepue commented 9 years ago

Separately, we did to a clean config and apply changes over it.

Redis is configured with


###########
## Redis ##
###########

# Generic Redis configuration used as defaults for various things including:
# Buffers, Quotas, TSDB

SENTRY_REDIS_OPTIONS = {
    'hosts': {
        0: {
            'host': SERVER',
            'port': 6379,
        }   
    }   
}   
##########
## TSDB ##
##########

# The TSDB is used for building charts as well as making things like per-rate
# alerts possible.

SENTRY_TSDB = 'sentry.tsdb.redis.RedisTSDB'

dcramer commented 9 years ago

Someone else reported a similar issue (though with less details, not sure if it was AWS), but it's not clear to me whats happening here.

We don't send GET (this is a hash that its operating on)
We were on redis 2.8.0/hiredis 0.1.1, but I am bumping to 2.10.3 and 0.1.5 to ensure we dont hit this in prod.

I dont know much about elastic cache, but is it native Redis or is there anything special about it?

adepue commented 9 years ago

It is native redis machines... just that AWS launches for you and handles replication to slaves.

adepue commented 9 years ago

Upgraded redis/hiredis and still no luck in tally counts improving

dcramer commented 9 years ago

I'm going to investigate if we changed this key (without renaming it). Could you try a flushdb and see if it resolves the errors? (I doubt it will)

adepue commented 9 years ago

No such luck. Just tried flush db. Also dumped memcache just to be safe.

rogerhu commented 9 years ago

The stack trace may have been me triggering the Redis errors. Funny thing is I get the key manually and the Redis value seems to be set correctly. The dashboard and DB though only shows times_seen=1.

(Pdb) pipe.hincrby(key, 'i+' + 'times_seen', 1)
Pipeline<ConnectionPool<Connection<host=er-dev-sentry.kpspip.ng.0001.usw2.cache.amazonaws.com,port=6379,db=0>>>
(Pdb) pipe.execute()
[0L, 12L]
(Pdb) conn.hget(key, 'i+' + 'times_seen')
'12'

dcramer commented 9 years ago

One last thing, are you running celerybeat? either as sentry celery worker -B or sentry celerybeat

adepue commented 9 years ago

No celerybeat at all.

dcramer commented 9 years ago

Ah that's the issue. I'll clarify documentation/etc, but you'll want to start up celerybeat

dcramer commented 9 years ago

Added to CHANGES in 40cf70c242e7cc072fb031c4685322decfbcf95f

adepue commented 9 years ago

Thanks!

getsentry / sentry

Event count/tally does not work post 7.0.X upgrade #1369