getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.53k stars 4.12k forks source link

Support Redis Cluster #35106

Open overstep123 opened 2 years ago

overstep123 commented 2 years ago

Environment

self-hosted (https://develop.sentry.dev/self-hosted/)

Version

21.7.0

Steps to Reproduce

use redis cluster

Expected Result

work normally

Actual Result

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/sentry_sdk/integrations/celery.py", line 197, in _inner
    reraise(*exc_info)
  File "/usr/local/lib/python3.6/site-packages/sentry_sdk/_compat.py", line 54, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/sentry_sdk/integrations/celery.py", line 192, in _inner
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/sentry/tasks/base.py", line 46, in _wrapped
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/sentry/tasks/store.py", line 851, in save_event
    _do_save_event(cache_key, data, start_time, event_id, project_id, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/sentry/tasks/store.py", line 765, in _do_save_event
    project_id, assume_normalized=True, start_time=start_time, cache_key=cache_key
  File "/usr/local/lib/python3.6/site-packages/sentry/utils/metrics.py", line 192, in inner
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/sentry/event_manager.py", line 426, in save
    **kwargs,
  File "/usr/local/lib/python3.6/site-packages/sentry/event_manager.py", line 1147, in _save_aggregate
    group=group, event=event, data=kwargs, release=release
  File "/usr/local/lib/python3.6/site-packages/sentry/event_manager.py", line 1327, in _process_existing_aggregate
    buffer.incr(Group, update_kwargs, {"id": group.id}, extra)
  File "/usr/local/lib/python3.6/site-packages/sentry/utils/services.py", line 102, in <lambda>
    context[key] = (lambda f: lambda *a, **k: getattr(self, f)(*a, **k))(key)
  File "/usr/local/lib/python3.6/site-packages/sentry/buffer/redis.py", line 188, in incr
    pipe.execute()
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 3967, in execute
    return execute(conn, stack, raise_on_error)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 3860, in _execute_transaction
    response = self.parse_response(connection, '_')
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 3926, in parse_response
    self, connection, command_name, **options)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 892, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 752, in read_response
    raise response
redis.exceptions.ResponseError: CROSSSLOT Keys in request don't hash to the same slot
getsentry-release commented 2 years ago

Routing to @getsentry/infrastructure for triage. ⏲️

mitsuhiko commented 2 years ago

Sentry uses redis in different ways. Which exact config key did you set to a redis cluster installation?

overstep123 commented 2 years ago

Sentry uses redis in different ways. Which exact config key did you set to a redis cluster installation?

I use a redis cluster of the public cloud, so I don't know the detail of the cluster installation. And I just config the loadbalance address of the redis cluster to the sentry redis.

chadwhitacre commented 2 years ago

Bump @getsentry/infrastructure. Do we expect self-hosted to work with Redis Cluster?

untitaker commented 2 years ago

Some of Sentry works with redis-blaster by default, some of Sentry works only with redis-cluster. The simplest would be to use a single-node redis to bypass any compatibility issues, and I think right now that's the only thing that we support for self-hosted. I think it would be desirable to change self-hosted to serve a minimal redis-cluster, but it would probably require some code changes in Sentry to fully support redis-cluster everywhere.

chadwhitacre commented 2 years ago

Okay I'm putting this on the backlog to see if it accumulates additional demand.

jimolonely commented 1 year ago

Okay I'm putting this on the backlog to see if it accumulates additional demand.

I have read the incomplete code: https://github.com/getsentry/sentry/blob/master/src/sentry/utils/redis.py#L274, and expect the migration to be done soon. Do you have a specific plan? How does the internal sentry Saas service use Redis?

untitaker commented 1 year ago

I don't think there's any movement on this. Your best bet right now is to use Envoy or a similar redis proxy, configure it as a single node in Sentry and let Envoy do the sharding.

That is not what we do in production (we have dedicated redis clusters for every single internal service, i.e. separated by domain/usecase), and we have no operational experience with it, but I believe it might be a way out of this so that you can operate everything on a single redis cluster.

Limsanity commented 1 year ago

Some of Sentry works with redis-blaster by default, some of Sentry works only with redis-cluster. The simplest would be to use a single-node redis to bypass any compatibility issues, and I think right now that's the only thing that we support for self-hosted. I think it would be desirable to change self-hosted to serve a minimal redis-cluster, but it would probably require some code changes in Sentry to fully support redis-cluster everywhere.

why some of the module use redis-blaster,the other use redis cluster,any idea?

untitaker commented 1 year ago

@Limsanity we were in need of a solution for horizontally scaling redis before redis cluster was invented. so we built our own (redis blaster). now we have two, and they're mutually incompatible.

aarnaud commented 6 months ago

I don't think there's any movement on this. Your best bet right now is to use Envoy or a similar redis proxy, configure it as a single node in Sentry and let Envoy do the sharding.

That is not what we do in production (we have dedicated redis clusters for every single internal service, i.e. separated by domain/usecase), and we have no operational experience with it, but I believe it might be a way out of this so that you can operate everything on a single redis cluster.

But @untitaker even with Envoy as redis proxy it doesn't work because INFO command isn't supported on Envoy https://github.com/envoyproxy/envoy/issues/8328

  File "/usr/local/lib/python3.8/site-packages/sentry/runner/settings.py", line 154, in configure
    initialize_app(
  File "/usr/local/lib/python3.8/site-packages/sentry/runner/initializer.py", line 391, in initialize_app
    setup_services(validate=not skip_service_validation)
  File "/usr/local/lib/python3.8/site-packages/sentry/runner/initializer.py", line 434, in setup_services
    service.validate()
  File "/usr/local/lib/python3.8/site-packages/sentry/utils/services.py", line 135, in <lambda>
    context[key] = (lambda f: lambda *a, **k: getattr(self, f)(*a, **k))(key)
  File "/usr/local/lib/python3.8/site-packages/sentry/digests/backends/redis.py", line 91, in validate
    check_cluster_versions(self.cluster, Version((2, 8, 9)), label="Digests")
  File "/usr/local/lib/python3.8/site-packages/sentry/utils/redis.py", line 242, in check_cluster_versions
    raise InvalidConfiguration(str(e))
sentry.exceptions.InvalidConfiguration: invalid request
untitaker commented 6 months ago

@aarnaud yep, envoy turned out not to be viable here. we just need to get rid of redis blaster. @anonrig can probably tell you whether there are current efforts for doing that