Open romanstingler opened 1 month ago
nice it runs for some time and then it just dies and i can't access the container anymore until reboot.
I am getting really frustrated, since 24.3 not a single updated version which wasn't broken.
even 64gb of ram is not working
kafka-1 | [2024-10-10 14:10:15,832] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
kafka-1 | [2024-10-10 14:10:25,815] INFO Updated connection-accept-rate max connection creation rate to 2147483647 (kafka.network.ConnectionQuotas)
kafka-1 | [2024-10-10 14:10:26,353] INFO [SocketServer listenerType=CONTROLLER, nodeId=1001] Created data-plane acceptor and processors for endpoint : ListenerName(CONTROLLER) (kafka.network.SocketServer)
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
transactions-consumer-1 | return f(get_current_context(), *args, **kwargs)
transactions-consumer-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry/runner/decorators.py", line 28, in inner
transactions-consumer-1 | configure()
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry/runner/__init__.py", line 126, in configure
transactions-consumer-1 | _configure(ctx, py, yaml, skip_service_validation)
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry/runner/settings.py", line 148, in configure
transactions-consumer-1 | initialize_app(
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry/runner/initializer.py", line 396, in initialize_app
transactions-consumer-1 | setup_services(validate=not skip_service_validation)
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry/runner/initializer.py", line 448, in setup_services
transactions-consumer-1 | service.validate()
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry/utils/services.py", line 117, in <lambda>
transactions-consumer-1 | context[key] = (lambda f: lambda *a, **k: getattr(self, f)(*a, **k))(key)
transactions-consumer-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry/buffer/redis.py", line 102, in validate
transactions-consumer-1 | validate_dynamic_cluster(self.is_redis_cluster, self.cluster)
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry/utils/redis.py", line 293, in validate_dynamic_cluster
transactions-consumer-1 | raise InvalidConfiguration(str(e))
transactions-consumer-1 | sentry.exceptions.InvalidConfiguration: Error -3 connecting to redis:6379. Temporary failure in name resolution.
transactions-consumer-1 | Updating certificates in /etc/ssl/certs...
transactions-consumer-1 | 0 added, 0 removed; done.
transactions-consumer-1 | Running hooks in /etc/ca-certificates/update.d...
transactions-consumer-1 | done.
transactions-consumer-1 | Traceback (most recent call last):
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 552, in connect
transactions-consumer-1 | sock = self._connect()
transactions-consumer-1 | ^^^^^^^^^^^^^^^
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 578, in _connect
transactions-consumer-1 | for res in socket.getaddrinfo(self.host, self.port, self.socket_type,
transactions-consumer-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
transactions-consumer-1 | File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
transactions-consumer-1 | for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
transactions-consumer-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
transactions-consumer-1 | socket.gaierror: [Errno -3] Temporary failure in name resolution
transactions-consumer-1 |
transactions-consumer-1 | During handling of the above exception, another exception occurred:
transactions-consumer-1 |
transactions-consumer-1 | Traceback (most recent call last):
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry/utils/redis.py", line 288, in validate_dynamic_cluster
transactions-consumer-1 | client.ping()
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 1351, in ping
transactions-consumer-1 | return self.execute_command('PING')
transactions-consumer-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/sentry_sdk/integrations/redis/__init__.py", line 235, in sentry_patched_execute_command
transactions-consumer-1 | return old_execute_command(self, name, *args, **kwargs)
transactions-consumer-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/rb/clients.py", line 488, in execute_command
transactions-consumer-1 | buf = self._get_command_buffer(host_id, args[0])
transactions-consumer-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/rb/clients.py", line 355, in _get_command_buffer
transactions-consumer-1 | buf = CommandBuffer(host_id, connect, self.auto_batch)
transactions-consumer-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
transactions-consumer-1 | File "/usr/local/lib/python3.11/site-packages/rb/clients.py", line 91, in __init__
transactions-consumer-1 | self.connect()
transactions-consumer-1 | File "/usr/local^C
snuba-generic-metrics-sets-consumer-1 | %3|1718627561.014|FAIL|rdkafka#producer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.33:9092 failed: Connection refused (after 24ms in state CONNECT)
snuba-errors-consumer-1 | {"timestamp":"2024-10-10T14:11:03.454538Z","level":"INFO","fields":{"message":"Starting consumer for \"errors\"","storage":"errors"},"target":"rust_snuba::consumer"}
worker-1 | 13:42:40 [WARNING] py.warnings: /.venv/lib/python3.12/site-packages/celery/app/utils.py:203: CDeprecationWarning:
subscription-consumer-metrics-1 | File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
snuba-generic-metrics-counters-consumer-1 | {"timestamp":"2024-03-20T13:27:03.227427Z","level":"INFO","fields":{"message":"Inserted 6 rows"},"target":"rust_snuba::strategies::clickhouse"}
web-1 | File "/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py", line 789, in urlopen
subscription-consumer-transactions-1 | 18:56:50 [INFO] arroyo.processing.processor: Stopped
billing-metrics-consumer-1 | return self.main(*args, **kwargs)
clickhouse-1 | 5. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b43a31 in /usr/bin/clickhouse
post-process-forwarder-errors-1 | ^^^^^^
snuba-profiling-functions-consumer-1 | {"timestamp":"2024-09-25T09:45:05.275491Z","level":"INFO","fields":{"message":"Inserted 6 rows"},"target":"rust_snuba::strategies::clickhouse"}
snuba-generic-metrics-distributions-consumer-1 | {"timestamp":"2024-03-20T12:36:48.012504Z","level":"INFO","fields":{"message":"Inserted 11 rows"},"target":"rust_snuba::strategies::clickhouse"}
generic-metrics-consumer-1 | 12:33:48 [INFO] arroyo.processing.processor: New partitions assigned: {Partition(topic=Topic(name='ingest-performance-metrics'), index=0): 8395457}
attachments-consumer-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
snuba-replays-consumer-1 | 2024-10-08 19:01:40,847 Snuba initialization took 7.816855292767286s
subscription-consumer-events-1 | File "/usr/src/sentry/src/sentry/utils/lazy_service_wrapper.py", line 117, in <lambda>
subscription-consumer-generic-metrics-1 | %3|1718627585.325|FAIL|rdkafka#consumer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.33:9092 failed: Connection refused (after 0ms in state CONNECT, 30 identical error(s) suppressed)
ingest-profiles-1 | return f(get_current_context(), *args, **kwargs)
post-process-forwarder-transactions-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^
snuba-subscription-consumer-metrics-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
post-process-forwarder-issue-platform-1 | 12:38:03 [INFO] sentry.post_process_forwarder.post_process_forwarder: Starting multithreaded post process forwarder
snuba-subscription-consumer-transactions-1 | File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/connection.py", line 395, in connect
snuba-transactions-consumer-1 | {"timestamp":"2024-03-20T13:12:18.811646Z","level":"INFO","fields":{"message":"Inserted 1 rows"},"target":"rust_snuba::strategies::clickhouse"}
snuba-profiling-profiles-consumer-1 | 2024-04-26 13:59:58,506 Snuba initialization took 19.744962955999995s
snuba-issue-occurrence-consumer-1 | {"timestamp":"2024-04-24T21:28:46.810069Z","level":"INFO","fields":{"message":"Inserted 1 rows"},"target":"rust_snuba::strategies::clickhouse"}
snuba-metrics-consumer-1 | {"timestamp":"2024-04-10T10:05:25.922281Z","level":"INFO","fields":{"message":"Inserted 2 rows"},"target":"rust_snuba::strategies::clickhouse"}
kafka-1 | [2024-10-10 14:10:33,239] INFO Initialized snapshots with IDs SortedSet(OffsetAndEpoch(offset=15524, epoch=6), OffsetAndEpoch(offset=22723, epoch=6), OffsetAndEpoch(offset=29922, epoch=6), OffsetAndEpoch(offset=37121, epoch=6), OffsetAndEpoch(offset=44320, epoch=6), OffsetAndEpoch(offset=51519, epoch=6), OffsetAndEpoch(offset=58718, epoch=6), OffsetAndEpoch(offset=65917, epoch=6), OffsetAndEpoch(offset=73116, epoch=6), OffsetAndEpoch(offset=80315, epoch=6), OffsetAndEpoch(offset=87513, epoch=6), OffsetAndEpoch(offset=94712, epoch=6), OffsetAndEpoch(offset=101910, epoch=6), OffsetAndEpoch(offset=109109, epoch=6), OffsetAndEpoch(offset=116308, epoch=6), OffsetAndEpoch(offset=123507, epoch=6), OffsetAndEpoch(offset=130706, epoch=6), OffsetAndEpoch(offset=137898, epoch=6), OffsetAndEpoch(offset=145084, epoch=6), OffsetAndEpoch(offset=149344, epoch=6), OffsetAndEpoch(offset=149489, epoch=6), OffsetAndEpoch(offset=152198, epoch=6), OffsetAndEpoch(offset=152918, epoch=6), OffsetAndEpoch(offset=153562, epoch=6)) from /var/lib/kafka/data/__cluster_metadata-0 (kafka.raft.KafkaMetadataLog$)
events-consumer-1 | 13:48:20 [INFO] arroyo.processing.processor: Partition revocation complete.
nginx-1 | 35.191.220.51 - - [10/Oct/2024:13:53:49 +0000] "GET /_health/ HTTP/1.1" 499 0 "-" "GoogleHC/1.0" "-"
snuba-subscription-consumer-events-1 | %3|1728568010.229|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Connect to ipv4#172.18.0.47:9092 failed: Connection refused (after 1ms in state CONNECT)
smtp-1 | 46 will write message using CHUNKING
redis-1 | 1:M 10 Oct 2024 14:13:31.715 * Background saving terminated with success
ingest-occurrences-1 | File "/usr/local/lib/python3.11/site-packages/rb/clients.py", line 91, in __init__
postgres-1 | 2024-10-10 14:18:52.110 UTC [23091] DETAIL: Key (project_id, environment_id)=(21, 3) already exists.
ingest-replay-recordings-1 | File "/usr/local/lib/python3.11/site-packages/rb/clients.py", line 254, in get_connection
metrics-consumer-1 | ^^^^^^^^^^^^^^^
ingest-monitors-1 | 13:31:00 [INFO] sentry: monitors.consumer.clock_tick (reference_datetime='2024-03-20 13:31:00+00:00')
cron-1 | return __callback(*args, **kwargs)
''' snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.511498Z","level":"INFO","fields":{"message":"Starting Rust consumer","consumer_config":"ConsumerConfig { storages: [StorageConfig { name: \"replays\", clickhouse_table_name: \"replays_local\", clickhouse_cluster: ClickhouseConfig { host: \"clickhouse\", port: 9000, http_port: 8123, user: \"default\", password: \"\", database: \"default\" }, message_processor: MessageProcessorConfig { python_class_name: \"ReplaysProcessor\", python_module: \"snuba.datasets.processors.replays_processor\" } }], raw_topic: TopicConfig { physical_topic_name: \"ingest-replay-events\", logical_topic_name: \"ingest-replay-events\", broker_config: {\"bootstrap.servers\": \"kafka:9092\", \"security.protocol\": \"plaintext\", \"queued.max.messages.kbytes\": \"10000\", \"queued.min.messages\": \"10000\"} }, commit_log_topic: None, replacements_topic: None, dlq_topic: Some(TopicConfig { physical_topic_name: \"snuba-dead-letter-replays\", logical_topic_name: \"snuba-dead-letter-replays\", broker_config: {\"security.protocol\": \"plaintext\", \"bootstrap.servers\": \"kafka:9092\"} }), accountant_topic: TopicConfig { physical_topic_name: \"shared-resources-usage\", logical_topic_name: \"shared-resources-usage\", broker_config: {\"security.protocol\": \"plaintext\", \"bootstrap.servers\": \"kafka:9092\"} }, max_batch_size: 50000, max_batch_time_ms: 750, env: EnvConfig { sentry_dsn: None, dogstatsd_host: None, dogstatsd_port: None, default_retention_days: 90, lower_retention_days: 30, valid_retention_days: {90, 30}, record_cogs: false, ddm_metrics_sample_rate: 0.01, project_stacktrace_blacklist: [] } }"},"target":"rust_snuba::consumer"} snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.512300Z","level":"INFO","fields":{"message":"Starting consumer for \"replays\"","storage":"replays"},"target":"rust_snuba::consumer"} snuba-replays-consumer-1 | %3|1728413698.534|FAIL|rdkafka#producer-1| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to resolve 'kafka:9092': Name or service not known (after 15ms in state CONNECT) snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.534271Z","level":"ERROR","fields":{"message":"librdkafka: Global error: Resolve (Local: Host resolution failure): kafka:9092/bootstrap: Failed to resolve 'kafka:9092': Name or service not known (after 15ms in state CONNECT)"},"target":"rdkafka::client"} snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.534507Z","level":"WARN","fields":{"message":"Ignored event 'Error' on base producer poll"},"target":"rdkafka::producer::base_producer"} snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.534531Z","level":"ERROR","fields":{"message":"librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down"},"target":"rdkafka::client"} snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.534543Z","level":"WARN","fields":{"message":"Ignored event 'Error' on base producer poll"},"target":"rdkafka::producer::base_producer"} snuba-replays-consumer-1 | %3|1728413698.534|FAIL|rdkafka#consumer-2| [thrd:kafka:9092/bootstrap]: kafka:9092/bootstrap: Failed to resolve 'kafka:9092': Name or service not known (after 8ms in state CONNECT) snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.534939Z","level":"ERROR","fields":{"message":"librdkafka: Global error: Resolve (Local: Host resolution failure): kafka:9092/bootstrap: Failed to resolve 'kafka:9092': Name or service not known (after 8ms in state CONNECT)","error":"Global error: Resolve (Local: Host resolution failure)"},"target":"rust_arroyo::backends::kafka"} snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.535118Z","level":"ERROR","fields":{"message":"poll error","error":"Message consumption error: Resolve (Local: Host resolution failure)"},"target":"rust_arroyo::processing"} snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.535420Z","level":"ERROR","fields":{"message":"librdkafka: Global error: AllBrokersDown (Local: All broker connections are down): 1/1 brokers are down","error":"Global error: AllBrokersDown (Local: All broker connections are down)"},"target":"rust_arroyo::backends::kafka"} snuba-replays-consumer-1 | {"timestamp":"2024-10-08T18:54:58.635425Z","level":"ERROR","fields":{"error":"poll error"},"target":"rust_snuba::consumer"} ''' just connection error within the 100 docker containers that you just throw in
clickhouse-1 | 0. Poco::Net::SocketImpl::error(int, String const&) @ 0x0000000015b3dbf2 in /usr/bin/clickhouse
clickhouse-1 | 1. Poco::Net::SocketImpl::peerAddress() @ 0x0000000015b40376 in /usr/bin/clickhouse
clickhouse-1 | 2. DB::HTTPServerRequest::HTTPServerRequest(std::shared_ptr
kafka-1 | [2024-10-10 15:26:14,159] ERROR Encountered metadata publishing fault: Error deleting stray partitions during startup (org.apache.kafka.server.fault.LoggingFaultHandler)
kafka-1 | java.lang.RuntimeException: The log dir Log(dir=/var/lib/kafka/data/snuba-dead-letter-metrics-distributions-0, topic=snuba-dead-letter-metrics-distributions, partition=0, highWatermark=0, lastStableOffset=0, logStartOffset=0, logEndOffset=0) does not have a topic ID, which is not allowed when running in KRaft mode.
kafka-1 | at kafka.server.metadata.BrokerMetadataPublisher$.$anonfun$findStrayPartitions$2(BrokerMetadataPublisher.scala:76)
kafka-1 | at scala.Option.getOrElse(Option.scala:201)
kafka-1 | at kafka.server.metadata.BrokerMetadataPublisher$.$anonfun$findStrayPartitions$1(BrokerMetadataPublisher.scala:75)
kafka-1 | at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:118)
kafka-1 | at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:105)
kafka-1 | at scala.collection.mutable.ArrayBuffer.flatMap(ArrayBuffer.scala:43)
kafka-1 | at kafka.server.metadata.BrokerMetadataPublisher$.findStrayPartitions(BrokerMetadataPublisher.scala:73)
kafka-1 | at kafka.server.metadata.BrokerMetadataPublisher.finishInitializingReplicaManager(BrokerMetadataPublisher.scala:353)
kafka-1 | at kafka.server.metadata.BrokerMetadataPublisher.onMetadataUpdate(BrokerMetadataPublisher.scala:246)
kafka-1 | at org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:309)
kafka-1 | at org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
kafka-1 | at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
kafka-1 | at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
kafka-1 | at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
kafka-1 | at java.base/java.lang.Thread.run(Thread.java:829)
issue are your rust-consumers, please take your time to fix AND TEST this !!!!
https://github.com/getsentry/self-hosted/issues/2974#issuecomment-2076560194
Could you please test with 24.11?
Self-Hosted Version
24.9
CPU Architecture
x86_64
Docker Version
25.0.2
Docker Compose Version
2.24.5
Steps to Reproduce
Sentry for real ??