getsentry / self-hosted

Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept
https://develop.sentry.dev/self-hosted/
Other
7.75k stars 1.75k forks source link

kafka & authentication errors #583

Closed gencer closed 4 years ago

gencer commented 4 years ago

When I get relay logs I see such errors [See bottom]

However, sentry frontend works and I receive errors from apps.

Should I ignore those?

I use latest master branch of this repo and I use following settings:

Note that as you can see in the config, I disabled nginx due to I already have it on the system and also I have exposed 9000 and 3000 ports to the system to be able to redirect requests to these ports from nginx.

sentry/sentry.conf.py

# This file is just Python, with a touch of Django which means
# you can inherit and tweak settings to your hearts content.

from sentry.conf.server import *  # NOQA

# Generously adapted from pynetlinux: https://git.io/JJmga
def get_internal_network():
    import ctypes
    import fcntl
    import math
    import socket
    import struct

    iface = "eth0"
    sockfd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    ifreq = struct.pack("16sH14s", iface, socket.AF_INET, b"\x00" * 14)

    try:
        ip = struct.unpack(
            "!I", struct.unpack("16sH2x4s8x", fcntl.ioctl(sockfd, 0x8915, ifreq))[2]
        )[0]
        netmask = socket.ntohl(
            struct.unpack("16sH2xI8x", fcntl.ioctl(sockfd, 0x891B, ifreq))[2]
        )
    except IOError:
        return ()
    base = socket.inet_ntoa(struct.pack("!I", ip & netmask))
    netmask_bits = 32 - int(round(math.log(ctypes.c_uint32(~netmask).value + 1, 2), 1))
    return ("{0:s}/{1:d}".format(base, netmask_bits),)

INTERNAL_IPS = get_internal_network()
INTERNAL_SYSTEM_IPS = INTERNAL_IPS

DATABASES = {
    "default": {
        "ENGINE": "sentry.db.postgres",
        "NAME": "postgres",
        "USER": "postgres",
        "PASSWORD": "",
        "HOST": "postgres",
        "PORT": "",
    }
}

# You should not change this setting after your database has been created
# unless you have altered all schemas first
SENTRY_USE_BIG_INTS = True

# If you're expecting any kind of real traffic on Sentry, we highly recommend
# configuring the CACHES and Redis settings

###########
# General #
###########

# Instruct Sentry that this install intends to be run by a single organization
# and thus various UI optimizations should be enabled.
SENTRY_SINGLE_ORGANIZATION = True

SENTRY_OPTIONS["system.event-retention-days"] = int(
    env("SENTRY_EVENT_RETENTION_DAYS", "90")
)

#########
# Redis #
#########

# Generic Redis configuration used as defaults for various things including:
# Buffers, Quotas, TSDB

SENTRY_OPTIONS["redis.clusters"] = {
    "default": {
        "hosts": {0: {"host": "redis", "password": "", "port": "6379", "db": "0"}}
    }
}

#########
# Queue #
#########

# See https://docs.getsentry.com/on-premise/server/queue/ for more
# information on configuring your queue broker and workers. Sentry relies
# on a Python framework called Celery to manage queues.

rabbitmq_host = None
if rabbitmq_host:
    BROKER_URL = "amqp://{username}:{password}@{host}/{vhost}".format(
        username="guest", password="guest", host=rabbitmq_host, vhost="/"
    )
else:
    BROKER_URL = "redis://:{password}@{host}:{port}/{db}".format(
        **SENTRY_OPTIONS["redis.clusters"]["default"]["hosts"][0]
    )

#########
# Cache #
#########

# Sentry currently utilizes two separate mechanisms. While CACHES is not a
# requirement, it will optimize several high throughput patterns.

CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.memcached.MemcachedCache",
        "LOCATION": ["memcached:11211"],
        "TIMEOUT": 3600,
    }
}

# A primary cache is required for things such as processing events
SENTRY_CACHE = "sentry.cache.redis.RedisCache"

DEFAULT_KAFKA_OPTIONS = {
    "bootstrap.servers": "kafka:9092",
    "message.max.bytes": 50000000,
    "socket.timeout.ms": 1000,
}

SENTRY_EVENTSTREAM = "sentry.eventstream.kafka.KafkaEventStream"
SENTRY_EVENTSTREAM_OPTIONS = {"producer_configuration": DEFAULT_KAFKA_OPTIONS}

KAFKA_CLUSTERS["default"] = DEFAULT_KAFKA_OPTIONS

###############
# Rate Limits #
###############

# Rate limits apply to notification handlers and are enforced per-project
# automatically.

SENTRY_RATELIMITER = "sentry.ratelimits.redis.RedisRateLimiter"

##################
# Update Buffers #
##################

# Buffers (combined with queueing) act as an intermediate layer between the
# database and the storage API. They will greatly improve efficiency on large
# numbers of the same events being sent to the API in a short amount of time.
# (read: if you send any kind of real data to Sentry, you should enable buffers)

SENTRY_BUFFER = "sentry.buffer.redis.RedisBuffer"

##########
# Quotas #
##########

# Quotas allow you to rate limit individual projects or the Sentry install as
# a whole.

SENTRY_QUOTAS = "sentry.quotas.redis.RedisQuota"

########
# TSDB #
########

# The TSDB is used for building charts as well as making things like per-rate
# alerts possible.

SENTRY_TSDB = "sentry.tsdb.redissnuba.RedisSnubaTSDB"

#########
# SNUBA #
#########

SENTRY_SEARCH = "sentry.search.snuba.EventsDatasetSnubaSearchBackend"
SENTRY_SEARCH_OPTIONS = {}
SENTRY_TAGSTORE_OPTIONS = {}

###########
# Digests #
###########

# The digest backend powers notification summaries.

SENTRY_DIGESTS = "sentry.digests.backends.redis.RedisBackend"

##############
# Web Server #
##############

SENTRY_WEB_HOST = "0.0.0.0"
SENTRY_WEB_PORT = 9000
#SENTRY_WEB_OPTIONS = {
#    # "http": "%s:%s" % (SENTRY_WEB_HOST, SENTRY_WEB_PORT),
##    # "protocol": "uwsgi",
#    # # This is needed to prevent https://git.io/fj7Lw
#    # "uwsgi-socket": None,
#    # "http-keepalive": True,
#    # "http-chunked-input": True,
#    "memory-report": False,
#    'workers': 8,  # the number of web workers
#}
SENTRY_WEB_OPTIONS = {
    "http": "%s:%s" % (SENTRY_WEB_HOST, SENTRY_WEB_PORT),
    "protocol": "uwsgi",
    # This is need to prevent https://git.io/fj7Lw
    "uwsgi-socket": None,
    "so-keepalive": True,
    # Keep this between 15s-75s as that's what Relay supports
    "http-keepalive": 15,
    "http-chunked-input": True,
    # the number of web workers
    "workers": 3,
    "threads": 4,
    "memory-report": False,
    # Some stuff so uwsgi will cycle workers sensibly
    "max-requests": 100000,
    "max-requests-delta": 500,
    "max-worker-lifetime": 86400,
    # Duplicate options from sentry default just so we don't get
    # bit by sentry changing a default value that we depend on.
    "thunder-lock": True,
    "log-x-forwarded-for": False,
    "buffer-size": 32768,
    "limit-post": 209715200,
    "disable-logging": True,
    "reload-on-rss": 600,
    "ignore-sigpipe": True,
    "ignore-write-errors": True,
    "disable-write-exception": True,
}

###########
# SSL/TLS #
###########

# If you're using a reverse SSL proxy, you should enable the X-Forwarded-Proto
# header and enable the settings below

SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True
SOCIAL_AUTH_REDIRECT_IS_HTTPS = True

# End of SSL/TLS settings

############
# Features #
############

SENTRY_FEATURES["projects:sample-events"] = False
SENTRY_FEATURES.update(
    {
        feature: True
        for feature in (
            "organizations:discover",
            "organizations:events",
            "organizations:global-views",
            "organizations:integrations-issue-basic",
            "organizations:integrations-issue-sync",
            "organizations:invite-members",
            "organizations:sso-basic",
            "organizations:sso-rippling",
            "organizations:sso-saml2",
            "projects:custom-inbound-filters",
            "projects:data-forwarding",
            "projects:discard-groups",
            "projects:plugins",
            "projects:rate-limits",
            "projects:servicehooks",
        )
    }
)

######################
# GitHub Integration #
######################

GITHUB_EXTENDED_PERMISSIONS = ["repo"]

#########################
# Bitbucket Integration #
########################

# BITBUCKET_CONSUMER_KEY = 'YOUR_BITBUCKET_CONSUMER_KEY'
# BITBUCKET_CONSUMER_SECRET = 'YOUR_BITBUCKET_CONSUMER_SECRET'

SENTRY_RELAY_WHITELIST_PK = (SENTRY_RELAY_WHITELIST_PK or []) + (["redacted"])

relay/config.yml

relay:
  upstream: "http://web:9000/"
  host: 0.0.0.0
  port: 3000
logging:
 level: WARN
processing:
  enabled: true
  kafka_config:
    - {name: "bootstrap.servers", value: "kafka:9092"}
    - {name: "message.max.bytes", value: 50000000} #50MB or bust
  redis: redis://redis:6379

docker-compose.yml

version: '3.4'
x-restart-policy: &restart_policy
  restart: unless-stopped
x-sentry-defaults: &sentry_defaults
  << : *restart_policy
  build:
    context: ./sentry
    args:
      - SENTRY_IMAGE
      - SENTRY_VERSION
  image: sentry-onpremise-local
  depends_on:
    - redis
    - postgres
    - memcached
    - smtp
    - snuba-api
    - snuba-consumer
    - snuba-outcomes-consumer
    - snuba-sessions-consumer
    - snuba-replacer
    - symbolicator
    - kafka
  environment:
    SENTRY_CONF: '/etc/sentry'
    SNUBA: 'http://snuba-api:1218'
  volumes:
    - 'sentry-data:/data'
    - './sentry:/etc/sentry'
x-snuba-defaults: &snuba_defaults
  << : *restart_policy
  depends_on:
    - redis
    - clickhouse
    - kafka
  image: 'getsentry/snuba:$SENTRY_VERSION'
  environment:
    SNUBA_SETTINGS: docker
    CLICKHOUSE_HOST: clickhouse
    DEFAULT_BROKERS: 'kafka:9092'
    REDIS_HOST: redis
    UWSGI_MAX_REQUESTS: '10000'
    UWSGI_DISABLE_LOGGING: 'true'
services:
  smtp:
    << : *restart_policy
    image: tianon/exim4
    volumes:
      - 'sentry-smtp:/var/spool/exim4'
      - 'sentry-smtp-log:/var/log/exim4'
  memcached:
    << : *restart_policy
    image: 'memcached:1.5-alpine'
  redis:
    << : *restart_policy
    image: 'redis:5.0-alpine'
    volumes:
      - 'sentry-redis:/data'
  postgres:
    << : *restart_policy
    image: 'postgres:9.6'
    environment:
      POSTGRES_HOST_AUTH_METHOD: 'trust'
    volumes:
      - 'sentry-postgres:/var/lib/postgresql/data'
  zookeeper:
    << : *restart_policy
    image: 'confluentinc/cp-zookeeper:5.5.0'
    environment:
      ZOOKEEPER_CLIENT_PORT: '2181'
      CONFLUENT_SUPPORT_METRICS_ENABLE: 'false'
      ZOOKEEPER_LOG4J_ROOT_LOGLEVEL: 'WARN'
      ZOOKEEPER_TOOLS_LOG4J_LOGLEVEL: 'WARN'
    volumes:
      - 'sentry-zookeeper:/var/lib/zookeeper/data'
      - 'sentry-zookeeper-log:/var/lib/zookeeper/log'
      - 'sentry-secrets:/etc/zookeeper/secrets'
  kafka:
    << : *restart_policy
    depends_on:
      - zookeeper
    image: 'confluentinc/cp-kafka:5.5.0'
    environment:
      KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://kafka:9092'
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: '1'
      KAFKA_MESSAGE_MAX_BYTES: '50000000' #50MB or bust
      KAFKA_MAX_REQUEST_SIZE: '50000000' #50MB on requests apparently too
      CONFLUENT_SUPPORT_METRICS_ENABLE: 'false'
      KAFKA_LOG4J_LOGGERS: 'kafka.cluster=WARN,kafka.controller=WARN,kafka.coordinator=WARN,kafka.log=WARN,kafka.server=WARN,kafka.zookeeper=WARN,state.change.logger=WARN'
      KAFKA_LOG4J_ROOT_LOGLEVEL: 'WARN'
      KAFKA_TOOLS_LOG4J_LOGLEVEL: 'WARN'
    volumes:
      - 'sentry-kafka:/var/lib/kafka/data'
      - 'sentry-kafka-log:/var/lib/kafka/log'
      - 'sentry-secrets:/etc/kafka/secrets'
  clickhouse:
    << : *restart_policy
    image: 'yandex/clickhouse-server:19.17'
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    volumes:
      - 'sentry-clickhouse:/var/lib/clickhouse'
      - 'sentry-clickhouse-log:/var/log/clickhouse-server'
  snuba-api:
    << : *snuba_defaults
  # Kafka consumer responsible for feeding events into Clickhouse
  snuba-consumer:
    << : *snuba_defaults
    command: consumer --storage events --auto-offset-reset=latest --max-batch-time-ms 750
  # Kafka consumer responsible for feeding outcomes into Clickhouse
  # Use --auto-offset-reset=earliest to recover up to 7 days of TSDB data
  # since we did not do a proper migration
  snuba-outcomes-consumer:
    << : *snuba_defaults
    command: consumer --storage outcomes_raw --auto-offset-reset=earliest --max-batch-time-ms 750
  # Kafka consumer responsible for feeding session data into Clickhouse
  snuba-sessions-consumer:
    << : *snuba_defaults
    command: consumer --storage sessions_raw --auto-offset-reset=latest --max-batch-time-ms 750
  snuba-replacer:
    << : *snuba_defaults
    command: replacer --storage events --auto-offset-reset=latest --max-batch-size 3
  snuba-cleanup:
    << : *snuba_defaults
    image: snuba-cleanup-onpremise-local
    build:
      context: ./cron
      args:
        BASE_IMAGE: 'getsentry/snuba:$SENTRY_VERSION'
    command: '"*/5 * * * * gosu snuba snuba cleanup --dry-run False"'
  symbolicator:
    << : *restart_policy
    image: 'getsentry/symbolicator:$SYMBOLICATOR_VERSION'
    volumes:
      - 'sentry-symbolicator:/data'
      - type: bind
        read_only: true
        source: ./symbolicator
        target: /etc/symbolicator
    command: run -c /etc/symbolicator/config.yml
  symbolicator-cleanup:
    << : *restart_policy
    image: symbolicator-cleanup-onpremise-local
    build:
      context: ./cron
      args:
        BASE_IMAGE: 'getsentry/symbolicator:$SYMBOLICATOR_VERSION'
    command: '"55 23 * * * gosu symbolicator symbolicator cleanup"'
    volumes:
      - 'sentry-symbolicator:/data'
  web:
    << : *sentry_defaults
    ports:
      - '127.0.0.1:9000:9000/tcp'
  cron:
    << : *sentry_defaults
    command: run cron
  worker:
    << : *sentry_defaults
    command: run worker
  ingest-consumer:
    << : *sentry_defaults
    command: run ingest-consumer --all-consumer-types
  post-process-forwarder:
    << : *sentry_defaults
    # Increase `--commit-batch-size 1` below to deal with high-load environments.
    command: run post-process-forwarder --commit-batch-size 1
  sentry-cleanup:
    << : *sentry_defaults
    image: sentry-cleanup-onpremise-local
    build:
      context: ./cron
      args:
        BASE_IMAGE: 'sentry-onpremise-local'
    command: '"0 0 * * * gosu sentry sentry cleanup --days $SENTRY_EVENT_RETENTION_DAYS"'
  # nginx:
    # << : *restart_policy
    # ports:
      # - '9000:80/tcp'
    # image: 'nginx:1.16'
    # volumes:
      # - type: bind
        # read_only: true
        # source: ./nginx
        # target: /etc/nginx
    # depends_on:
      # - web
      # - relay
  relay:
    << : *restart_policy
    image: 'getsentry/relay:$SENTRY_VERSION'
    volumes:
      - type: bind
        read_only: true
        source: ./relay
        target: /work/.relay
    depends_on:
      - kafka
      - redis
    sysctls:
        - net.ipv6.conf.all.disable_ipv6=1
    ports:
      - '127.0.0.1:3000:3000/tcp'
volumes:
  sentry-data:
    external: true
  sentry-postgres:
    external: true
  sentry-redis:
    external: true
  sentry-zookeeper:
    external: true
  sentry-kafka:
    external: true
  sentry-clickhouse:
    external: true
  sentry-symbolicator:
    external: true
  sentry-secrets:
  sentry-smtp:
  sentry-zookeeper-log:
  sentry-kafka-log:
  sentry-smtp-log:
  sentry-clickhouse-log:

Errors

root@buc23s42-r65-u29-in-ird4:~/onpremise# docker logs sentry_onpremise_relay_1
2020-07-13T18:00:53Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.30.0.10:9092 failed: Connection refused (after 1ms in state CONNECT)
2020-07-13T18:00:53Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-07-13T18:00:53Z [rdkafka::client] ERROR: librdkafka: Global error: BrokerTransportFailure (Local: Broker transport failure): kafka:9092/bootstrap: Connect to ipv4#172.30.0.10:9092 failed: Connection refused (after 9ms in state CONNECT)
2020-07-13T18:00:53Z [rdkafka::client] ERROR: librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down
2020-07-13T18:00:53Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
2020-07-13T18:00:53Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Failed resolving hostname: no record found for name: web type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
  caused by: Failed resolving hostname: no record found for name: web type: AAAA class: IN
2020-07-13T18:00:55Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Timeout while waiting for response
2020-07-13T18:00:57Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-07-13T18:00:59Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: Failed to connect to host: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
2020-07-13T18:01:01Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-07-13T18:01:02Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-07-13T18:01:04Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
2020-07-13T18:01:06Z [relay_server::actors::project_upstream] ERROR: error fetching project states: attempted to send request while not yet authenticated
BYK commented 4 years ago

This is a bug in your setup and not in this repo so I'll be closing this ticket. Here are the issues:

  1. Your Nginx setup doesn't seem to be directing event ingestion to Relay, and that's probably why you are able to receive events. This is going to change as soon as you update your Sentry image as we have removed this endpoint from Sentry. You need Relay to work for getting events now.
  2. Since you are not using docker-compose for this setup, the http://web:9000 part in Relay config is invalid. It should point to your Sentry web instance so Relay can talk to it. There's also a corresponding setting for sentry you need to change over at https://github.com/getsentry/onpremise/blob/a2507c10e042726e55184a7cb25aebc18069b3a6/sentry/config.example.yml#L68
  3. Related to the point above, there are more places you need to update hostnames for Kafka, Snuba, Redis etc. which explains the Kafka-related errors in your Relay setup. The related setting there is kafka_config.

Hope this helps.

Next time, please use the forums for questions like this.