George3d6 / Inquisitor

An easily extensible, minimal footprint monitoring tool. (Still in the testing phase)
BSD 2-Clause "Simplified" License
30 stars 4 forks source link

Agent Robustness #16

Closed Deedasmi closed 6 years ago

Deedasmi commented 6 years ago

One of the things mentioned in a previous discussion is that plugins should not crash the agent. I'm working on implementing std::panic::catch_unwind, and run into a problem. Mutable function calls are not UnwindSafe, and can't be caught.

This means that plugin.gather() can't be put inside a catch_unwind(). I do believe that gather should be mutable though.

I can catch initialization errors with another change to plugins (https://github.com/Deedasmi/Inquisitor/blob/master/agent/plugins/src/plugin_initialization.rs#L8). Do you think that's enough? Or do you think this is unnecessary?

George3d6 commented 6 years ago

Hmh, I'm not sure I want to catch any errors as much as serve them upstream for handling.

As far as initialization errors go the user could want the thing to crash, however some initialization errors could happen with pre-compiled versions (even assuming we remove crashes from missing configs, you could still have pre-compiled plugins which don't work on all subsets of machines).

For now I'd think adding the try catch and a parameter to the config to use/ignore it is the best option. (E.g. In dev mode or when being cautious I'd want to always crash. On production machine I could see a reason for just setting it to "ignore" in certain case).

As far as handling errors from "gather" those should just be sent in the form of Error(err) and than the messages either logged, ignored or sent to the receptor depending on a config parameter in the agent_config. At least that was my idea. Does that sound reasonable ?

Deedasmi commented 6 years ago

The catch_unwind was mainly an experiment to if we could protect the agent from a poorly programed plugin. Doesn't really seem feasible though. Feel free to correct me if I'm wrong, but it sounds like you're learning more towards 'normal result passing is fine', which works for me.

The last question I have is more related than it sounds.

How do you feel about bad error messages for plugin authors? This is the error output if one of our five plugins uses the wrong plugin::new() return type:

https://pastebin.com/yyyURJm1

It's just about impossible to figure out what the problem is from the error.

I can generate a better error message by explicitly stating the type, but then I have to know the type. This kind of thing would be a requirement: https://github.com/Deedasmi/Inquisitor/blob/catch_unwind2/agent/agent_plugins/alive/src/lib.rs#L11. Which, for us, would be considered an API change.

Deedasmi commented 6 years ago

My master branch is the one with the crappy error and the one I'd probably recommend.

Deedasmi commented 6 years ago

Closing this in favor of #19 and #20