Event logging - Githubissues

erikh commented 8 years ago

Event logging should expose an endpoint in the API server which is fed events by the API subsystem. The API should be similar to log or fmt, but output JSON payloads take some additional arguments:

The hostname of the event (use the daemonconfig structs; they take arguments to override it)
The affected volume (if any)
Actions taken represented as tags or classes; particularly WRT use locks, the Reason in the struct.

It might be possible to leverage logrus.WithFields and the json logging feature to accomplish this.

This patch should come in three parts independently and in order:

[ ] Generic API for writing events
[ ] API bridge (pub/sub)
[ ] Write changes into various subsystems (one at a time)

Please let me know if you have additional questions implementing this feature.

erikh commented 8 years ago

@unclejack this ticket has been updated.

unclejack commented 8 years ago

This could be added in a new package called events. The events package would make it possible to have multiple producers and multiple consumers for events.

JSON payload

In addition to the fields mentioned above, the type of daemon from which the event was produced should also be logged.

Events related to event consumers should also be produced. This would make it possible to debug problems around the consumption of events and the disconnection of slow consumers.

Logging events

Clients invoke events.Log() to log an event. This uses the etcd client to set a key in an etcd directory with the JSON payload as value.

A TTL of 5 seconds is used to expire the keys automatically. It should be possible to configure the TTL to accommodate larger clusters which would log more events.

Consuming events

Events can be consumed through an endpoint in the API server. When a consumer connects to this endpoint, it receives logged events as JSON.

The API server creates a new etcd watcher with a context for every consumer. The responses provided by the watcher are stored in a buffer.

If the consumer is too slow or it's consuming events at a rate which is slower than the rate at which events are produced, the consumer is disconnected. Resources associated to a disconnected consumer will be freed. The context of the watcher is canceled.

erikh commented 8 years ago

Re: client, I thikn we need to revisit this in light of the new upcoming db changes. events.Log probably shouldn't exist, but client.LogEvent maybe should (if that makes sense). Let's discuss this offline since there are several tickets involved. Hooking into the client either way would be something worth having a discussion about.

let's leave slow consumers out of the equation for now; that can be a long-term goal.

I really like leveraging the ttl for garbage collection and watches to step around stale data. that's a nice touch.

We'll have to be careful to not to write to the same keys though, because writes get compressed in consul it seems. (TIL) Nanosecond timing or UUIDs or something.

erikh commented 8 years ago

or maybe we could pay close attention to the indexes that are returned. That will require some adjustment in the new DB system. Let me know which one sounds most appealing.

contiv-experimental / volplugin

Event logging #239