Graylog2 / graylog-plugin-threatintel

Graylog Processing Pipeline functions to enrich log messages with IoC information from threat intelligence databases
Other
150 stars 19 forks source link

OTX Processing Stopped #6

Closed urban-moniker closed 7 years ago

urban-moniker commented 8 years ago

Hi,

Plugin really good, but have hit an issue with OTX processing. After running fine for about 12 hours it suddenly stopped - error logged as below.

I suspect (seeing as the Tor and Spamhaus functions were\are fine, and we still had an issue after restarting graylog) that there is a rate limit applied by Alienvault which we tripped by looking up all inbound IP addresses on our firewall :)

We built up a backlog of 4M messages, but as soon as I disabled OTX processing in my pipeline it started working. Will investigate more today, but as a FYI.

2016-11-02T07:26:54.440Z ERROR [OTXIPLookupFunction] Could not lookup OTX threat intelligence for IP [198.20.70.114]. java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: Could not load OTX response. at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:476) ~[graylog.jar:?] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:455) ~[graylog.jar:?] at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:79) ~[graylog.jar:?] at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:143) ~[graylog.jar:?] at com.google.common.cache.LocalCache$LoadingValueReference.waitForValue(LocalCache.java:3573) ~[graylog.jar:?] at com.google.common.cache.LocalCache$Segment.waitForLoadingValue(LocalCache.java:2306) ~[graylog.jar:?] at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2195) ~[graylog.jar:?] at com.google.common.cache.LocalCache.get(LocalCache.java:3953) ~[graylog.jar:?] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3957) ~[graylog.jar:?] at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4875) ~[graylog.jar:?] at org.graylog.plugins.threatintel.providers.otx.OTXLookupProvider.lookup(OTXLookupProvider.java:101) ~[graylog-plugin-threatintel-0.7.0.jar:?] at org.graylog.plugins.threatintel.providers.otx.ip.OTXIPLookupFunction.evaluate(OTXIPLookupFunction.java:55) [graylog-plugin-threatintel-0.7.0.jar:?] at org.graylog.plugins.threatintel.providers.otx.ip.OTXIPLookupFunction.evaluate(OTXIPLookupFunction.java:17) [graylog-plugin-threatintel-0.7.0.jar:?] at org.graylog.plugins.pipelineprocessor.ast.expressions.FunctionExpression.evaluateUnsafe(FunctionExpression.java:59) [graylog-plugin-pipeline-processor-1.1.1.jar:?] at org.graylog.plugins.pipelineprocessor.ast.expressions.Expression.evaluate(Expression.java:36) [graylog-plugin-pipeline-processor-1.1.1.jar:?] at org.graylog.plugins.pipelineprocessor.ast.statements.VarAssignStatement.evaluate(VarAssignStatement.java:33) [graylog-plugin-pipeline-processor-1.1.1.jar:?] at org.graylog.plugins.pipelineprocessor.ast.statements.VarAssignStatement.evaluate(VarAssignStatement.java:22) [graylog-plugin-pipeline-processor-1.1.1.jar:?] at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.processForResolvedPipelines(PipelineInterpreter.java:357) [graylog-plugin-pipeline-processor-1.1.1.jar:?] at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.processForPipelines(PipelineInterpreter.java:291) [graylog-plugin-pipeline-processor-1.1.1.jar:?] at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.process(PipelineInterpreter.java:248) [graylog-plugin-pipeline-processor-1.1.1.jar:?] at org.graylog.plugins.pipelineprocessor.processors.PipelineInterpreter.process(PipelineInterpreter.java:192) [graylog-plugin-pipeline-processor-1.1.1.jar:?] at org.graylog2.buffers.processors.ServerProcessBufferProcessor.handleMessage(ServerProcessBufferProcessor.java:56) [graylog.jar:?] at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.dispatchMessage(ProcessBufferProcessor.java:82) [graylog.jar:?] at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:61) [graylog.jar:?] at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:35) [graylog.jar:?] at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?] at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91] Caused by: java.util.concurrent.ExecutionException: Could not load OTX response. at org.graylog.plugins.threatintel.providers.otx.OTXLookupProvider.callOTX(OTXLookupProvider.java:142) ~[?:?] at org.graylog.plugins.threatintel.providers.otx.ip.OTXIPLookupProvider.loadIntel(OTXIPLookupProvider.java:73) ~[?:?] at org.graylog.plugins.threatintel.providers.otx.OTXLookupProvider$1.load(OTXLookupProvider.java:49) ~[?:?] at org.graylog.plugins.threatintel.providers.otx.OTXLookupProvider$1.load(OTXLookupProvider.java:46) ~[?:?] at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542) ~[graylog.jar:?] at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2323) ~[graylog.jar:?] at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2286) ~[graylog.jar:?] at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) ~[graylog.jar:?] ... 21 more

omercnet commented 8 years ago

same issue here, this caused the pipeline to get stuck and fill up the entire journal.. had to turn everything off

lennartkoopmann commented 8 years ago

How many lookups/minute was that?

omercnet commented 8 years ago

roughly 20/sec

lennartkoopmann commented 8 years ago

It heavily depends on how many different IPs you look at per second because the responses are cached. It might be that we are simply hitting the OTX limits here and for those use-cases you have to go with pre-loaded lists like Spamhaus.

urban-moniker commented 8 years ago

Thanks.

For me we'll prob be doing >10x omercnets volume (not sure of the cache hit rate though) - I just directed all firewall logs at it to see how it handled the volume. I know that the free API access is limited so we obviously found their limits!

I agree with not leveraging interactive 3rd party lookups, which is why we'd be looking to collate and present internally using something like MineMeld. Out of interest, we have about 100,000 IOC's of various levels in our curated list (pulled down from multiple opensource sources); do you think that would be something the plugin could handle in terms of overhead?

lennartkoopmann commented 8 years ago

That should work, yes.

lennartkoopmann commented 7 years ago

I'll close this issue for now. Thanks!