Alert plugins can prevent alert conditions from firing

avongluck-r1soft commented 7 years ago

We see an issue around the OpsGenie plugin for Graylog preventing all alert conditions from firing in 2.2.2 (with zero error messages).

After uninstalling the OpsGenie plugin and restarting Graylog the issue went away and alert conditions in streams started working. We've had an issue open to OpsGenie for a week now on it.

avongluck-r1soft commented 7 years ago

Testing:

Erase all conditions on stream.
Configure basic condition around message counts.
Alert to email and Slack to test
Manually send alert to trigger.
No alerts triggered.
Restart graylog
No alerts triggered.
Uninstall OpsGenie plugin
Restart graylog
Alerts triggered.

While it isn't Graylog's fault since this is a 3rd party plugin... the ability for a 3rd party plugin to potentially break all of the alerting is a serious issue as "Backup" notification methods will silently fail.

joschi commented 7 years ago

@avongluck-r1soft Are there any alerting or OpsGenie related exceptions in the logs of your Graylog node(s)?

avongluck-r1soft commented 7 years ago

So I tried to narrow the issue down.. but the behavior seems erratic. When does an alert "resolve"? I noticed that editing the conditions and unchecking the "Repeat notifications(optional)" box seems to "resolve" the alert. However the alert refuses to trigger any additional notifications after this point.

avongluck-r1soft commented 7 years ago

(I can re-check repeat notifications and still can't trigger any notifications 30 minutes since the last successful one)

avongluck-r1soft commented 7 years ago

If I restart graylog-server (leaving repeat notifications checked) I get a flood of them after startup.. then a alert is open and doesn't seem to ever resolve.

Here is the alert log:

2017-03-31 17:31:29.108
Graylog sent Full filesystem (Slack alarm callback) notification
2017-03-31 17:31:41.049
Graylog sent Email On-call about full filesystems (Email Alert Callback) notification
2017-03-31 17:44:48.391
Condition is configured to repeat notifications, Graylog will send notifications when evaluating the condition until it is no longer satisfied
2017-03-31 17:44:48.392
Condition is still satisfied, alert is unresolved

So 12 minutes later it is still considered open even though the threshold is 5 minutes.

avongluck-r1soft commented 7 years ago

Full filesystem (Message Count Alert Condition)
Alerting on stream Full filesystems
Configuration: Alert is triggered when there are more than 0 messages in the last 5 minutes. Grace period: 0 minutes. Including last 5 messages in alert notification. Configured to repeat notifications.

avongluck-r1soft commented 7 years ago

20 minutes later, still unresolved. so:

Alert is unresolved
I toggle the Repeat notifications(optional) box in the conditions to test.
Alert shows as resolved, however internally it "isn't"?
No additional notifications will trigger
Restart graylog-server
Burst of alerts. Rinse and repeat.

avongluck-r1soft commented 7 years ago

Maybe the OpsGenie notification is at fault somehow and "holds" the alerts open?

2017-03-31 17:31:41.049
Graylog sent Email On-call about full filesystems (Email Alert Callback) notification
2017-03-31 17:57:33.698
Condition is configured to repeat notifications, Graylog will send notifications when evaluating the condition until it is no longer satisfied
2017-03-31 17:57:33.698
Condition is still satisfied, alert is unresolved

Attaching a server.log which doesn't tell us much.

server.log.txt

avongluck-r1soft commented 7 years ago

Ok. Here is some solid documented oddness:

2017-03-31 17:31:41.049
Graylog sent Email On-call about full filesystems (Email Alert Callback) notification
2017-03-31 18:12:01.520
Condition is configured to repeat notifications, Graylog will send notifications when evaluating the condition until it is no longer satisfied
2017-03-31 18:12:01.520
Condition is still satisfied, alert is unresolved

Notice as of 18:12 "no notifications since 17:31". However "messages evaluated" shows:

2017-03-31 17:32:37.081 XXX -- WARNING XXX.net (XXX) DiskUsage high = 98 mount_point = /JUST_A_TEST
2017-03-31 17:34:48.928 XXX -- WARNING XXX.net (XXX) DiskUsage high = 98 mount_point = /JUST_A_TEST
2017-03-31 17:42:12.168 XXX -- WARNING XXX.net (XXX) DiskUsage high = 100 mount_point = /storage06
2017-03-31 18:02:39.432 XXX -- WARNING XXX.net (XXX) DiskUsage high = 100 mount_point = /storage06
2017-03-31 18:11:07.225 XXX -- WARNING XXX.net (XXX) DiskUsage high = 98 mount_point = /JUST_A_TEST

So:

Repeat notifications checked
Alert open
New messages evaluated
No new events since 17:31

avongluck-r1soft commented 7 years ago

^^^ all of that testing was with the OpsGenie plugin installed.

I uninstalled the OpsGenie plugin again and removed the notification, now events seem to always happen for open alerts as expected. No errors seen in server.log up through now.

avongluck-r1soft commented 7 years ago

I painfully realized over the weekend at 3am that "repeat notifiations" means a notification every minute while an alert is open, even if only one condition was matched. With that understanding the comment above still proves to be out-of-line with intended behaviour.

edmundoa commented 7 years ago

Hi @avongluck-r1soft,

Sorry to read that 😞. We have an issue with grace period and repeat notifications (https://github.com/Graylog2/graylog2-server/issues/3579), and it will be fixed in an upcoming patch release.

avongluck-r1soft commented 7 years ago

Ok. I think this issue is related to odd alert behavior. The alerts never seem to resolve. Screenshots taken at 9:48am the same day:

Alert begin 1 hour 12 minutes ago:

Alert unresolved still after 1 hour 12 minutes:

Only one message evaluated 1 hour 12 minutes ago.

So at this point in time:

The grace period is expired (60 minutes)
The "0 messages in 5 minutes" is expired.

Anyone have any idea what's going on here? When should alerts consider themselves closed? I re-enabled the OpsGenie plugin and set it up after upgrading to 2.2.3. The OpsGenie guys said they couldn't reproduce the issue, however.

avongluck-r1soft commented 7 years ago

The first screenshot shows: "Stream had 1 messages in the last 5 minutes" (which shouldn't be true any longer) Maybe "more than 0 messages" is incorrectly configured? Normally "more than 0" means 1 or more... maybe the logic is >= 0 vs > 0 ?

avongluck-r1soft commented 7 years ago

Nah. https://github.com/Graylog2/graylog2-server/blob/master/graylog2-server/src/main/java/org/graylog2/alerts/types/MessageCountAlertCondition.java#L184 Logic seems fine there (MORE is >)

dennisoelkers commented 7 years ago

@avongluck-r1soft: Is this still an issue with the most recent Graylog version (2.2.3)?

avongluck-r1soft commented 7 years ago

Yeah. Here's an awesome screenshot showing some weird alert state:

nonsense

avongluck-r1soft commented 7 years ago

Given the configuration, i'd expect the alert to close after 2 minutes. However its been open for 15 with no signs of closing.

avongluck-r1soft commented 7 years ago

Ah. ok. I got some great screenshots showing the problem:

Without OpsGenie plugin installed: (working as expected. Closes after 2 minutes) timeline-ok

With OpsGenie plugin installed and configured: (not working. Alert gets stuck open, no OpsGenie notification (others are listed as triggered, but never seem to fire?) timeline-problem

joschi commented 7 years ago

@avongluck-r1soft Anything interesting in the logs of your Graylog nodes at that time?

avongluck-r1soft commented 7 years ago

Nothing at all interesting in the logs (quiet) I'm not crazy, right? That second screenshot shows an alert in a state it shouldn't be in.

2017-05-16T10:33:12.031-05:00 INFO  [CmdLineTool] Loaded plugin: OpsGenieAlarmCallback 1.0.0 [com.opsgenie.plugin.graylog.OpsGenieAlarmCallbackPlugin]
2017-05-16T10:33:12.033-05:00 INFO  [CmdLineTool] Loaded plugin: Elastic Beats Input 2.2.3 [org.graylog.plugins.beats.BeatsInputPlugin]
2017-05-16T10:33:12.033-05:00 INFO  [CmdLineTool] Loaded plugin: Collector 2.2.3 [org.graylog.plugins.collector.CollectorPlugin]
2017-05-16T10:33:12.034-05:00 INFO  [CmdLineTool] Loaded plugin: Enterprise Integration Plugin 2.2.3 [org.graylog.plugins.enterprise_integration.EnterpriseIntegrationPlugin]
2017-05-16T10:33:12.034-05:00 INFO  [CmdLineTool] Loaded plugin: MapWidgetPlugin 2.2.3 [org.graylog.plugins.map.MapWidgetPlugin]
2017-05-16T10:33:12.040-05:00 INFO  [CmdLineTool] Loaded plugin: Pipeline Processor Plugin 2.2.3 [org.graylog.plugins.pipelineprocessor.ProcessorPlugin]
2017-05-16T10:33:12.040-05:00 INFO  [CmdLineTool] Loaded plugin: Anonymous Usage Statistics 2.2.3 [org.graylog.plugins.usagestatistics.UsageStatsPlugin]
2017-05-16T10:33:12.041-05:00 INFO  [CmdLineTool] Loaded plugin: Slack 2.4.0 [org.graylog2.plugins.slack.callback.SlackAlarmCallback]
2017-05-16T10:33:12.194-05:00 INFO  [CmdLineTool] Running with JVM arguments: -Xms10g -Xmx10g -XX:NewRatio=1 -XX:PermSize=128m -XX:MaxPermSize=256m -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dgraylog2.installation_source=rpm
2017-05-16T10:33:12.324-05:00 INFO  [Version] HV000001: Hibernate Validator null
2017-05-16T10:33:13.789-05:00 INFO  [InputBufferImpl] Message journal is enabled.
2017-05-16T10:33:13.804-05:00 INFO  [NodeId] Node ID: XXX-c572-4717-b7e8-f7cada1266c6
2017-05-16T10:33:13.933-05:00 INFO  [LogManager] Loading logs.
2017-05-16T10:33:13.969-05:00 INFO  [LogManager] Logs loading complete.
2017-05-16T10:33:13.969-05:00 INFO  [KafkaJournal] Initialized Kafka based journal at /var/lib/graylog-server/journal
2017-05-16T10:33:14.083-05:00 INFO  [InputBufferImpl] Initialized InputBufferImpl with ring size <65536> and wait strategy <BlockingWaitStrategy>, running 1 parallel message handlers.
2017-05-16T10:33:14.102-05:00 INFO  [cluster] Cluster created with settings {hosts=[10.XXX.XXX.XXX:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2017-05-16T10:33:14.143-05:00 INFO  [cluster] No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=10.XXX.XXX.XXX:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
2017-05-16T10:33:14.162-05:00 INFO  [connection] Opened connection [connectionId{localValue:1, serverValue:157}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:14.164-05:00 INFO  [cluster] Monitor thread successfully connected to server with description ServerDescription{address=10.XXX.XXX.XXX:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 12]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=435048}
2017-05-16T10:33:14.173-05:00 INFO  [connection] Opened connection [connectionId{localValue:2, serverValue:158}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:14.399-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] version[2.4.4], pid[12887], build[fcbb46d/2017-01-03T11:33:16Z]
2017-05-16T10:33:14.399-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] initializing ...
2017-05-16T10:33:14.404-05:00 INFO  [plugins] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] modules [], plugins [graylog-monitor], sites []
2017-05-16T10:33:15.359-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] initialized
2017-05-16T10:33:15.399-05:00 INFO  [ProcessBuffer] Initialized ProcessBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2017-05-16T10:33:16.497-05:00 INFO  [RulesEngineProvider] No static rules file loaded.
2017-05-16T10:33:16.805-05:00 INFO  [OutputBuffer] Initialized OutputBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2017-05-16T10:33:17.544-05:00 INFO  [ServerBootstrap] Graylog server 2.2.3+7adc951 starting up
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] JRE: Oracle Corporation 1.8.0_121 on Linux 3.10.0-327.36.3.el7.x86_64
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] Deployment: rpm
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] OS: CentOS Linux 7 (Core) (centos)
2017-05-16T10:33:17.545-05:00 INFO  [ServerBootstrap] Arch: amd64
2017-05-16T10:33:17.556-05:00 WARN  [DeadEventLoggingListener] Received unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from event bus <AsyncEventBus{graylog-eventbus}>
2017-05-16T10:33:17.567-05:00 INFO  [PeriodicalsService] Starting 26 periodicals ...
2017-05-16T10:33:17.568-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ThroughputCalculator] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.569-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] starting ...
2017-05-16T10:33:17.572-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlertScannerThread] periodical in [10s], polling every [60s].
2017-05-16T10:33:17.575-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.BatchedElasticSearchOutputFlushThread] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ClusterHealthCheckThread] periodical in [120s], polling every [20s].
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ContentPackLoaderPeriodical] periodical, running forever.
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.GarbageCollectionWarningThread] periodical, running forever.
2017-05-16T10:33:17.576-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexerClusterCheckerThread] periodical in [0s], polling every [30s].
2017-05-16T10:33:17.577-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRetentionThread] periodical in [0s], polling every [300s].
2017-05-16T10:33:17.577-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRotationThread] periodical in [0s], polling every [10s].
2017-05-16T10:33:17.577-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.NodePingThread] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.577-05:00 INFO  [IndexRetentionThread] Elasticsearch cluster not available, skipping index retention checks.
2017-05-16T10:33:17.586-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.VersionCheckThread] periodical in [300s], polling every [1800s].
2017-05-16T10:33:17.586-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ThrottleStateUpdaterThread] periodical in [1s], polling every [1s].
2017-05-16T10:33:17.587-05:00 INFO  [Periodicals] Starting [org.graylog2.events.ClusterEventPeriodical] periodical in [0s], polling every [1s].
2017-05-16T10:33:17.587-05:00 INFO  [Periodicals] Starting [org.graylog2.events.ClusterEventCleanupPeriodical] periodical in [0s], polling every [86400s].
2017-05-16T10:33:17.588-05:00 INFO  [connection] Opened connection [connectionId{localValue:4, serverValue:160}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.590-05:00 INFO  [connection] Opened connection [connectionId{localValue:3, serverValue:159}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.590-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ClusterIdGeneratorPeriodical] periodical, running forever.
2017-05-16T10:33:17.591-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRangesMigrationPeriodical] periodical, running forever.
2017-05-16T10:33:17.592-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexRangesCleanupPeriodical] periodical in [15s], polling every [3600s].
2017-05-16T10:33:17.595-05:00 INFO  [connection] Opened connection [connectionId{localValue:6, serverValue:162}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.595-05:00 INFO  [connection] Opened connection [connectionId{localValue:7, serverValue:163}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.599-05:00 INFO  [connection] Opened connection [connectionId{localValue:5, serverValue:161}] to 10.XXX.XXX.XXX:27017
2017-05-16T10:33:17.644-05:00 INFO  [PeriodicalsService] Not starting [org.graylog2.periodical.UserPermissionMigrationPeriodical] periodical. Not configured to run on this node.
2017-05-16T10:33:17.644-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.AlarmCallbacksMigrationPeriodical] periodical, running forever.
2017-05-16T10:33:17.644-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.ConfigurationManagementPeriodical] periodical, running forever.
2017-05-16T10:33:17.646-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.LdapGroupMappingMigration] periodical, running forever.
2017-05-16T10:33:17.647-05:00 INFO  [Periodicals] Starting [org.graylog2.periodical.IndexFailuresPeriodical] periodical, running forever.
2017-05-16T10:33:17.647-05:00 INFO  [Periodicals] Starting [org.graylog.plugins.usagestatistics.UsageStatsNodePeriodical] periodical in [300s], polling every [21600s].
2017-05-16T10:33:17.648-05:00 INFO  [Periodicals] Starting [org.graylog.plugins.usagestatistics.UsageStatsClusterPeriodical] periodical in [300s], polling every [21600s].
2017-05-16T10:33:17.650-05:00 INFO  [PeriodicalsService] Not starting [org.graylog.plugins.pipelineprocessor.periodical.LegacyDefaultStreamMigration] periodical. Not configured to run on this node.
2017-05-16T10:33:17.650-05:00 INFO  [Periodicals] Starting [org.graylog.plugins.collector.periodical.PurgeExpiredCollectorsThread] periodical in [0s], polling every [3600s].
2017-05-16T10:33:17.718-05:00 INFO  [IndexerClusterCheckerThread] Indexer not fully initialized yet. Skipping periodic cluster check.
2017-05-16T10:33:17.890-05:00 INFO  [V20161130141500_DefaultStreamRecalcIndexRanges] Cluster not connected yet, delaying migration until it is reachable.
2017-05-16T10:33:17.911-05:00 INFO  [transport] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] publish_address {127.0.0.1:9350}, bound_addresses {[::1]:9350}, {127.0.0.1:9350}
2017-05-16T10:33:17.916-05:00 INFO  [discovery] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] graylog-prod/HPPt87SoSmewgUKqwfs6UQ
2017-05-16T10:33:18.054-05:00 INFO  [JerseyService] Enabling CORS for HTTP endpoint
2017-05-16T10:33:20.919-05:00 WARN  [discovery] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] waited for 3s and no initial state was set by the discovery
2017-05-16T10:33:20.919-05:00 INFO  [node] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] started
2017-05-16T10:33:21.059-05:00 INFO  [service] [graylog-d2636a89-c572-4717-b7e8-f7cada1266c6] detected_master {Neophyte}{p-qRoaYrS7Gb0yfRMgky9g}{10.XXX.XXX.XXX}{10.XXX.XXX.XXX:9300}, added {{Neophyte}{p-qRoaYrS7Gb0yfRMgky9g}{10.XXX.XXX.XXX}{10.XXX.XXX.XXX:9300},{graylog-bca2dfe8-77c8-4d01-8495-d08f7bdea82f}{TgllDbp2TOOcMijSrWLE1g}{10.XXX.XX.XX}{10.170.70.17:9350}{client=true, data=false, master=false},}, reason: zen-disco-receive(from master [{Neophyte}{p-qRoaYrS7Gb0yfRMgky9g}{10.XXX.XXX.XXX}{10.XXX.XXX.XXX:9300}])
2017-05-16T10:33:25.239-05:00 INFO  [NetworkListener] Started listener bound to [10.XXX.XXX.XXX:12900]
2017-05-16T10:33:25.241-05:00 INFO  [HttpServer] [HttpServer] Started.
2017-05-16T10:33:25.241-05:00 INFO  [JerseyService] Started REST API at <http://10.XXX.XXX.XXX:12900/>
2017-05-16T10:33:27.199-05:00 INFO  [NetworkListener] Started listener bound to [127.0.0.1:9000]
2017-05-16T10:33:27.199-05:00 INFO  [HttpServer] [HttpServer-1] Started.
2017-05-16T10:33:27.200-05:00 INFO  [JerseyService] Started Web Interface at <http://127.0.0.1:9000/>
2017-05-16T10:33:27.201-05:00 INFO  [ServiceManagerListener] Services are healthy
2017-05-16T10:33:27.202-05:00 INFO  [ServerBootstrap] Services started, startup times in ms: {KafkaJournal [RUNNING]=9, InputSetupService [RUNNING]=11, JournalReader [RUNNING]=19, OutputSetupService [RUNNING]=25, BufferSynchronizerService [RUNNING]=25, ConfigurationEtagService [RUNNING]=47, PeriodicalsService [RUNNING]=99, StreamCacheService [RUNNING]=258, IndexerSetupService [RUNNING]=3514, JerseyService [RUNNING]=9634}
2017-05-16T10:33:27.204-05:00 INFO  [ServerBootstrap] Graylog server up and running.
2017-05-16T10:33:27.209-05:00 INFO  [InputSetupService] Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE]
2017-05-16T10:33:27.226-05:00 INFO  [InputStateListener] Input [Syslog UDP/58f4bfb00ed1c97c79d8f863] is now STARTING
2017-05-16T10:33:27.264-05:00 WARN  [NettyTransport] receiveBufferSize (SO_RCVBUF) for input SyslogUDPInput{title=Syslog UDP Input, type=org.graylog2.inputs.syslog.udp.SyslogUDPInput, nodeId=null} should be 262144 but is 212992.
2017-05-16T10:33:27.270-05:00 INFO  [InputStateListener] Input [Syslog UDP/58f4bfb00ed1c97c79d8f863] is now RUNNING

avongluck-r1soft commented 7 years ago

I've finally found an interesting correlation with a repeatable pass situation.

I remove our Nginx proxy from infront of the Graylog server.
Alerts fire as expected. OpsGenie plugin works as expected.
OpsGenie notifications trigger and no longer hang up
- We were seeing the notifications never firing, and the alerts never closing.

This really makes no sense to me unless binding graylog on 127.0.0.1:9000 was somehow messing up the outbound network within the java server (hanging up the notification?). We don't see any http calls failing, but there was a noticeable removal of erratic behavior once nginx was removed from infront of the web interface (aka. we bound web_listen_uri to the private interface and killed off nginx)

I've attached the proxy + graylog config we were using.

graylog.config.txt

alexmateescu commented 7 years ago

avongluck

i have the same issue with opsgenie but i have no nginx in front of the graylog server.

could you please paste the graylog server conf so i can compare it to mine. Thanks

this issue has been driving us nuts.

bernd commented 4 years ago

The old alerting system has been replaced with the alerts and events system in the meantime. This issue shouldn't happen anymore. Please open a new issue if this is still a problem. Thank you!

Graylog2 / graylog2-server

Alert plugins can prevent alert conditions from firing #3663