Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.41k stars 1.07k forks source link

Content pack installation does not rollback on failure #5584

Open edmundoa opened 5 years ago

edmundoa commented 5 years ago

Expected Behavior

After a failure during content pack installation, the installation should rollback all created entities, so the system is exactly as it was before trying to install the content pack.

Current Behavior

A failure during content pack installation leaves some entities behind. On this case a dashboard.

Server stack trace ``` 2019-01-23 17:35:04,383 ERROR: org.graylog2.shared.rest.exceptionmappers.AnyExceptionClassMapper - Unhandled exception in REST resource [3/1941] org.graylog2.contentpacks.exceptions.ContentPackException: Failed to install content pack <9136f1bb-5d5f-49cb-bc49-9551f28c143a/1> at org.graylog2.contentpacks.ContentPackService.installContentPack(ContentPackService.java:158) ~[graylog.jar:?] at org.graylog2.contentpacks.ContentPackService.installContentPack(ContentPackService.java:99) ~[graylog.jar:?] at org.graylog2.rest.resources.system.contentpacks.ContentPackResource.installContentPack(ContentPackResource.java:294) ~[graylog.jar:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191] at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) ~[graylog.jar:?] at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) ~[graylog.jar:?] at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) ~[graylog.jar:?] at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) ~[graylog.jar:?] at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) ~[graylog.jar:?] at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) ~[graylog.jar:?] at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) ~[graylog.jar:?] at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) ~[graylog.jar:?] at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) [graylog.jar:?] at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) [graylog.jar:?] at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) [graylog.jar:?] at org.glassfish.jersey.internal.Errors.process(Errors.java:315) [graylog.jar:?] at org.glassfish.jersey.internal.Errors.process(Errors.java:297) [graylog.jar:?] at org.glassfish.jersey.internal.Errors.process(Errors.java:267) [graylog.jar:?] at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) [graylog.jar:?] at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) [graylog.jar:?] at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) [graylog.jar:?] at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:384) [graylog.jar:?] at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:224) [graylog.jar:?] at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) [graylog.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191] Caused by: org.graylog2.contentpacks.exceptions.ContentPackException: Couldn't create dashboard at org.graylog2.contentpacks.facades.DashboardFacade.decode(DashboardFacade.java:163) ~[graylog.jar:?] at org.graylog2.contentpacks.facades.DashboardFacade.createNativeEntity(DashboardFacade.java:141) ~[graylog.jar:?] at org.graylog2.contentpacks.ContentPackService.installContentPack(ContentPackService.java:149) ~[graylog.jar:?] ... 29 more Caused by: org.graylog2.contentpacks.exceptions.ContentPackException: Missing stream for dashboard widget "Security exceptions": EntityDescriptor{id=5c474805e0015c16e50bc32e, type=ModelType{name=stream, version=1}} at org.graylog2.contentpacks.facades.DashboardFacade.createDashboardWidget(DashboardFacade.java:241) ~[graylog.jar:?] at org.graylog2.contentpacks.facades.DashboardFacade.createDashboard(DashboardFacade.java:189) ~[graylog.jar:?] at org.graylog2.contentpacks.facades.DashboardFacade.decode(DashboardFacade.java:154) ~[graylog.jar:?] at org.graylog2.contentpacks.facades.DashboardFacade.createNativeEntity(DashboardFacade.java:141) ~[graylog.jar:?] at org.graylog2.contentpacks.ContentPackService.installContentPack(ContentPackService.java:149) ~[graylog.jar:?] (...) ```
Half-created dashboard left behind ``` { "_id" : ObjectId("5c4897b8e0015c3b11556db5"), "creator_user_id" : "hans", "description" : "test", "created_at" : ISODate("2019-01-23T16:35:04.380Z"), "title" : "Test dashboard", "widgets" : [ { "creator_user_id" : "hans", "cache_time" : 10, "description" : "Map", "id" : "933a5f44-bca8-489a-8740-c45db3edb236", "type" : "org.graylog.plugins.map.widget.strategy.MapWidgetStrategy", "config" : { "timerange" : { "type" : "relative", "range" : 300 }, "field" : "src_ip_geolocation", "query" : "" } }, { "creator_user_id" : "hans", "cache_time" : 10, "description" : "Top programs", "id" : "373f2c7c-6c1c-4610-8090-04e93e40019b", "type" : "QUICKVALUES", "config" : { "timerange" : { "type" : "relative", "range" : 300 }, "field" : "program", "query" : "", "show_data_table" : true, "limit" : 5, "show_pie_chart" : true, "sort_order" : "desc", "stacked_fields" : "", "data_table_limit" : 10 } }, { "creator_user_id" : "hans", "cache_time" : 10, "description" : "Quick values histogram", "id" : "58f1ce59-777f-42b7-8ec1-1b13cc933839", "type" : "QUICKVALUES_HISTOGRAM", "config" : { "timerange" : { "type" : "relative", "range" : 300 }, "field" : "system", "query" : "", "limit" : 5, "sort_order" : "desc", "stacked_fields" : "" } } ] } ```

Affected content pack. It was installed on a system where the original entities still existed. I had to use .log as extension so Github is happy about it. download.log

Steps to Reproduce (for bugs)

  1. Install provided content pack
  2. Error occurs during install
  3. Go to dashboards page and see the duplicated entity
  4. Content pack installation shows "No data available"

Your Environment

kmerz commented 5 years ago

@bernd and me decided to remove the blocker label, since the problem should not occur to often when the software is stable. The proper solution which is described below, but will take time which is need it to address more urgent issues.

Problem

The bug is that during a installation right now only finished entities get uninstalled during a rollback. But entities like dashboard which have other entities included (but are not a content pack entitie) like DashboardWidget. During the install of an dashboard the dashboard is saved so the widgets can get installed. During the widget installation a error can occur and the dashboard does not get cleaned up, because it was not finished yet.

Solution

The solution to the described problem is that every facade needs to implement it's own rollback mechanism since only the facade can know what (sub)entities it installs and how to uninstall it.