Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.35k stars 1.06k forks source link

Improve Lookup Table Lifecycle #4524

Open bernd opened 6 years ago

bernd commented 6 years ago

Lifecycle States

Expected Behavior

The lookup table system needs more lifecycle states to allow better management of tables, caches and adapters.

This needs some more thinking and discussion to make sure we cover all needed lifecycle states.

We should also think about implementing are more generic lifecycle system which can be reused in other systems as well to avoid creating new solutions for the same problem over and over again.

Current Behavior

For the threatintel plugin in 2.4 we needed some way to disable lookup data adapters to make sure the adapters don't consume resources and do remote requests by default. To avoid any more server core changes, we modified the affected data adapters to throw an exception when the adapter should be disable. This is why we see exceptions like this when Graylog is starting with disabled threatintel data adapters:

2018-01-24T22:34:57.229Z ERROR [LookupDataAdapter] Couldn't start data adapter <tor-exit-node/5a342bdf2c1e3e4f8a4fd826/@7da63b87>
org.graylog.plugins.threatintel.tools.AdapterDisabledException: TOR service is disabled, not starting TOR exit addresses adapter. To enable it please go to System / Configurations.
        at org.graylog.plugins.threatintel.adapters.tor.TorExitNodeDataAdapter.doStart(TorExitNodeDataAdapter.java:73) ~[?:?]
        at org.graylog2.plugin.lookup.LookupDataAdapter.startUp(LookupDataAdapter.java:59) [graylog.jar:?]
        at com.google.common.util.concurrent.AbstractIdleService$DelegateService$1.run(AbstractIdleService.java:62) [graylog.jar:?]
        at com.google.common.util.concurrent.Callables$4.run(Callables.java:122) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]

We knew that we need something better than this in the future, but for 2.4 we decided to do it like this.

Lifecycle Dependencies

Expected Behavior

When a data adapter or cache blocks during startup, it shouldn't prevent all other lookup tables from starting. The failing data adapter or cache should only affect those lookup tables that use them.

Current Behavior

In https://github.com/Graylog2/graylog2-server/issues/4748 a single data adapter was blocking because it was downloading a very large CSV file from a HTTP server. This prevented all lookup tables from starting.

The problem is, that LookupTableService#startUp() is only starting the lookup tables once all caches and data adapters either started successfully or failed to start. When a data adapter or cache is blocking, no lookup table gets started until the adapter setup unblocks.

https://github.com/Graylog2/graylog2-server/blob/a61d597c837d8a58581ce75e4d5b1f1cf70b74a3/graylog2-server/src/main/java/org/graylog2/lookup/LookupTableService.java#L113-L128

joschi commented 6 years ago

Refs Graylog2/graylog-plugin-threatintel#55