Log driver for OpenTelemetry

mikehaller commented 1 year ago

Currently, the available log driver implementations in container-management are json-file and none.

For the integration of OpenTelemetry, it would be great to have a log driver implementation speaking OLTP and sending container logs to an OpenTelemetry Collector endpoint.

e-grigorov commented 1 year ago

OpenTelemetry will bring a bunch of new dependencies i.e. the daemon size will be affected. As a first step here, we have to evaluate the impact.
Another issue is that Go implementation has to be checked. It seems that logs are not yet implemented.

antoniyatrifonova commented 1 year ago

The OpenTelemetry specification for the Logs is currently not available in a stable state. Fortunately, after a brief research, we can note that the implementation is in an active state. For now, in OpenTelemety specification, the Logs already have a stable data-model and OTLP support. There is an open and active discussion about the API.

In our opinion, for now, it is better to wait for a stable version before looking for alternative options.

mikehaller commented 1 year ago

even getting the logs as a tcp stream from kanto cm would be helpful.

right now, we have to statically define filenames and watch individual files, which is really cumbersome with random uuids in the filepaths

also, there is no api to get the filename of the logfile. if that would be there, we could at least get the proper filepaths for new containers.

k-gostev commented 1 year ago

@mikehaller Isn't that what we did in https://github.com/eclipse-kanto/container-management/issues/98, or I am getting it wrong? Basically you can use kanto-cm logs <container-id>/-n <container-name> and it will fetch the logs using gRPC streaming. You don't need to know the file paths to get the logs of a container anymore.

mikehaller commented 1 year ago

kant-cm logs is intended for a human user.

how would you do that on a production system where you do not even want to install a cli tool?

how do you want to monitor 50+ containers? spawning 50 separate cli processes just to get the logs? that sounds like an extreme overhead to me.

dimitar-dimitrow commented 1 year ago

The CLI could be used remotely using the --host flag, so during development it could be used without installing it on the actual device. Also the logs of a container could be fetched directly from the gRPC Logs command without using the CLI.

To get the full logfile path of a container, you need the container manager home directory, the container identifier and the logfile name.

The container manager homedir by default is /var/lib/container-management_ and is configurable on daemon start up, check here.
Containers identifiers could be fetch using the gRPC List command.
The logfile name for the json driver is defined in github.com/eclipse-kanto/container-management/containerm/logger/jsonfile.JSONFileLogDriverName.

By default the logfile is located in /containers/(e.g. /var/lib/container-management/containers/1e15be49-b587-47fa-aad7-cad16acd1859). The logfile path could be configured per container during creation - check _Logging/rootdir in the doc.

Defining a log driver to tcp stream the logs to a remote endpoint is possible. However there are some drawbacks that should be taken in account.

When this remote driver is used the logs would not be preserved locally, so no logs would be available through the CLI and the gRPC.
Logs would be lost if there is no connection to the remote endpoint.
Significant traffic(e.g. many log entries emitted from those 50+ containers) which may become a problem for a resource limited edge device.
Major expenses or running out of data quota due to increased traffic.

Fetching the logs after an issue is detected would use less traffic and resources. Do you think that this approach is applicable in your usecase? If we stick to a tcp streaming the logs, could you provide some requirements(log entries format, tcp or tcp+tls and ect.)?

mikehaller commented 1 year ago

Interesting discussion, i like where this is going.

So, we have already two high level requirements:

Minimize disk I/O for production, so logging to a json.log file on disk is a no-go for automotive, and i would assume for IoT devices as well.
Minimize traffic / bandwidth to reduce costs

To add:

Minimize maintenance and integration effort by using standard protocol such as OTel
On-demand collection of logs at runtime (no reconfiguration of system or restarting of containers should be necessary)
Streaming of logs (no zipping and uploading, devs need "realtime" streaming during app development for example)

So, how about always attaching stdout and stderr but piping into a ringbuffer. Then, when remote controller activates streaming of logs, the log stream is sent to a collector endpoint. Collector does compression and filtering and then streams to remote endpoint.

You specify a fixed size for the ringbuffer (say 25MB).

What do you think about such an approach?

eclipse-kanto / container-management

Log driver for OpenTelemetry #99