getsentry / self-hosted

Sentry, feature-complete and packaged up for low-volume deployments and proofs-of-concept
https://develop.sentry.dev/self-hosted/
Other
7.75k stars 1.75k forks source link

Alert not working #3219

Open xuweixi10 opened 2 months ago

xuweixi10 commented 2 months ago

Self-Hosted Version

24.7.0

CPU Architecture

x86_64

Docker Version

27.0.3

Docker Compose Version

2.28.1

Steps to Reproduce

  1. select a transaction and create alert.
  2. it's show status is ok, but in fact should trigger an alert

Image

Image

Expected Result

trigger an alert

Actual Result

nothing happned

Event ID

No response

hubertdeng123 commented 2 months ago

Are there any logs in your web container that may be showing errors occurring? Otherwise, I wonder if there might be some kafka offset lag going on here. Can you try triggering another alert to see if this goes through?

xuweixi11 commented 2 months ago

i get warning

Traceback (most recent call last):
  File "/usr/src/sentry/src/sentry/snuba/referrer.py", line 914, in validate_referrer
    raise Exception(error_message)
Exception: referrer api.metrics.series is not part of Referrer Enum
09:55:47 [WARNING] sentry.snuba.referrer: referrer api.metrics.series is not part of Referrer Enum
xuweixi11 commented 2 months ago

i aslo get error in snuba

2024-07-25 16:15:55,933 New partitions assigned: {Partition(topic=Topic(name='snuba-metrics-commit-log'), index=0): 37, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=1): 13673, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=2): 27867, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=3): 14012, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=4): 6, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=5): 13909, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=6): 0, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=7): 13762, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=8): 100997, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=9): 13795, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=10): 13791, Partition(topic=Topic(name='snuba-metrics-commit-log'), index=11): 13805}
2024-07-25 16:15:55,933 Initialized processing strategy: <snuba.subscriptions.scheduler_processing_strategy.TickBuffer object at 0x7f2702488e50>
2024-07-25 16:16:03,405 Caught exception, shutting down...
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 322, in run
    self._run_once()
  File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 410, in _run_once
    self.__processing_strategy.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 252, in submit
    self.__next_step.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 275, in submit
    tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
                                   ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 2
2024-07-25 16:16:03,409 Terminating <snuba.subscriptions.scheduler_processing_strategy.TickBuffer object at 0x7f2702488e50>...
2024-07-25 16:16:03,409 Closing <snuba.subscriptions.scheduler_consumer.CommitLogTickConsumer object at 0x7f2702248410>...
2024-07-25 16:16:03,410 Partitions to revoke: [Partition(topic=Topic(name='snuba-metrics-commit-log'), index=0), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=1), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=2), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=3), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=4), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=5), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=6), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=7), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=8), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=9), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=10), Partition(topic=Topic(name='snuba-metrics-commit-log'), index=11)]
2024-07-25 16:16:03,410 Partition revocation complete.
2024-07-25 16:16:03,413 Processor terminated
Traceback (most recent call last):
  File "/usr/local/bin/snuba", line 33, in <module>
    sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/snuba/snuba/cli/subscriptions_scheduler_executor.py", line 153, in subscriptions_scheduler_executor
    processor.run()
  File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 322, in run
    self._run_once()
  File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 410, in _run_once
    self.__processing_strategy.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/scheduler_processing_strategy.py", line 252, in submit
    self.__next_step.submit(message)
  File "/usr/src/snuba/snuba/subscriptions/combined_scheduler_executor.py", line 275, in submit
    tasks.extend([task for task in entity_scheduler[tick.partition].find(tick)])
                                   ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: 2
xuweixi11 commented 2 months ago

after i fix snuba error, still no alert trigger

xuweixi11 commented 2 months ago

after i fix Exception: referrer api.metrics.series is not part of Referrer Enum, still no alert trigger

IanWoodard commented 2 months ago

Could you provide some more context? What do your logs look like for Snuba?

xuweixi11 commented 1 month ago

the log for snuba look like is ok

2024-07-30 10:02:59,526 Setting result and waking blocked clients...
10.139.186.237 - - [30/Jul/2024:10:02:59 +0000] "POST /search_issues/snql HTTP/1.1" 200 1800 "tsdb-modelid:20" "python-urllib3/2.2.2"
2024-07-30 10:02:59,542 Executing task ('e6a5db3a4e5a11efaf350a9ba7313b72') with 30 second timeout...
2024-07-30 10:02:59,544 Query: SELECT (group_id AS _snuba_group_id), (toUnixTimestamp(toStartOfHour((client_timestamp AS _snuba_timestamp))) AS _snuba_time), (count() AS _snuba_aggregate), _snuba_time FROM search_issues_local_v2 WHERE has(_tags_hash_map, cityHash64('monitor.slug=monitors-detect-broken-monitor-envs')) AND equals((environment AS _snuba_environment), 'production') AND in(_snuba_group_id, [33]) AND in((project_id AS _snuba_project_id), [1]) AND greaterOrEquals(_snuba_timestamp, toDateTime('2024-07-29T11:00:00', 'Universal')) AND less(_snuba_timestamp, toDateTime('2024-07-30T11:00:00', 'Universal')) GROUP BY _snuba_group_id, _snuba_time ORDER BY _snuba_time DESC, _snuba_group_id ASC LIMIT 24 OFFSET 0
2024-07-30 10:02:59,544 Block "" send time: 0.000013
2024-07-30 10:02:59,552 Setting result and waking blocked clients...
10.139.186.237 - - [30/Jul/2024:10:02:59 +0000] "POST /search_issues/snql HTTP/1.1" 200 1800 "tsdb-modelid:20" "python-urllib3/2.2.2"
2024-07-30 10:04:01,032 Executing task ('0b4c67ba4e5b11efaf350a9ba7313b72') with 30 second timeout...
2024-07-30 10:04:01,033 Query: SELECT (group_id AS _snuba_group_id), (toDateTime(multiply(intDiv(toUInt32(timestamp), 10), 10), 'Universal') AS _snuba_time), (count() AS _snuba_aggregate) FROM errors_local PREWHERE in(_snuba_group_id, [7996]) WHERE equals(deleted, 0) AND in((project_id AS _snuba_project_id), [2]) AND greaterOrEquals((timestamp AS _snuba_timestamp), toDateTime('2024-07-30T10:03:00', 'Universal')) AND less(_snuba_timestamp, toDateTime('2024-07-30T10:03:10', 'Universal')) GROUP BY _snuba_group_id, _snuba_time ORDER BY _snuba_time DESC, _snuba_group_id ASC LIMIT 1 OFFSET 0
2024-07-30 10:04:01,034 Block "" send time: 0.000016
2024-07-30 10:04:01,047 Setting result and waking blocked clients...
10.139.175.190 - - [30/Jul/2024:10:04:01 +0000] "POST /events/snql HTTP/1.1" 200 2390 "tsdb-modelid:4" "python-urllib3/2.2.2"
2024-07-30 10:04:01,067 Executing task ('0b51ad7e4e5b11efaf350a9ba7313b72') with 30 second timeout...
2024-07-30 10:04:01,068 Query: SELECT (group_id AS _snuba_group_id), (ifNull(uniq((nullIf(user, '') AS `_snuba_tags[sentry:user]`)), 0) AS _snuba_aggregate) FROM errors_local PREWHERE in(_snuba_group_id, [7996]) WHERE equals(deleted, 0) AND in((project_id AS _snuba_project_id), [2]) AND greaterOrEquals((timestamp AS _snuba_timestamp), toDateTime('2024-07-30T10:03:00', 'Universal')) AND less(_snuba_timestamp, toDateTime('2024-07-30T10:03:10', 'Universal')) GROUP BY _snuba_group_id ORDER BY _snuba_group_id ASC LIMIT 1 OFFSET 0
2024-07-30 10:04:01,068 Block "" send time: 0.000015
2024-07-30 10:04:01,076 Setting result and waking blocked clients...
10.139.175.190 - - [30/Jul/2024:10:04:01 +0000] "POST /events/snql HTTP/1.1" 200 2302 "tsdb-modelid:300" "python-urllib3/2.2.2"
2024-07-30 10:04:01,429 Executing task ('0b88eeb04e5b11efaf350a9ba7313b72') with 30 second timeout...
2024-07-30 10:04:01,430 Query: SELECT (replaceAll(toString(event_id), '-', '') AS _snuba_event_id), (group_id AS _snuba_group_id), (project_id AS _snuba_project_id), (timestamp AS _snuba_timestamp) FROM errors_local PREWHERE in(_snuba_group_id, tuple(7996)) WHERE equals(deleted, 0) AND greaterOrEquals(_snuba_timestamp, toDateTime('2024-07-30T09:58:00', 'Universal')) AND less(_snuba_timestamp, toDateTime('2024-07-30T10:04:02', 'Universal')) AND in(_snuba_project_id, tuple(2)) AND in(_snuba_project_id, tuple(2)) ORDER BY _snuba_timestamp DESC, _snuba_event_id DESC LIMIT 1 OFFSET 0
2024-07-30 10:04:01,430 Block "" send time: 0.000017
2024-07-30 10:04:01,440 Setting result and waking blocked clients...
xuweixi10 commented 1 month ago

Are there any logs in your web container that may be showing errors occurring? Otherwise, I wonder if there might be some kafka offset lag going on here. Can you try triggering another alert to see if this goes through?

i think so but there are too many topics and consumers, i check some maybe about metrics Topic: metrics-subscription-results no producer no consumer Topic: generic-metrics-subscription-results: no producer no consumer traffic but have consumer count? Image

i also find the snuba-generic-metrics have too many messges is this normal? Image

xuweixi10 commented 1 month ago

after i check the alert code in sentry, maybe the problem is the topic metrics-subscription-results never have a message, it's produce from snuba snuba rust-consumer --storage metrics_raw --consumer-group snuba-metrics-consumers --auto-offset-reset=latest --max-batch-time-ms 750 --no-strict-offset-reset maybe i need start more sunba shell?

xuweixi10 commented 1 month ago

after i check the code , i find in scheduler_consumer.py __synchronization_timestamp use 'received_p99' but the topic snuba-transactions-commit-log only have orig_message_ts value for example {"offset":30338,"orig_message_ts":1722783407.593,"received_p99":null} so it's always invalid after i change the config it works

Image so i'm confused why received_p99 is null

hubertdeng123 commented 1 month ago

@getsentry/owners-snuba Do you have more context on why this is happening?

nomarek commented 4 weeks ago

I ran into the same problem when defining alert rules. There are no errors in the snuba container logs and Kafka events are being processed correctly. Do you have any idea what could be causing the problem and how to fix it? Interestingly, the alert rules defined on "Number of Errors" seem to work fine for me, but I am not able to use "Transaction duration" and "Apdex" alerts.

Kitsunees commented 3 weeks ago

@nomarek Happened the same to us. Did you try to replace all occurrences of rust-consumer in docker-compose.yml file to just consumer. Then re-run ./install.sh script.

You can use this as a temporal fix to get alerts. We found this on another github issue and it worked.

nomarek commented 3 weeks ago

Thank you! This workaround worked.