Structured JSON Logging

CMCDragonkai commented 2 years ago

Description

This introduces structured JSON logging. As an intermediate step towards tracing and machine parseable data that can be used to construct contexts and traces.

We already have a LogFormatter type. Any thing that produces a string given a LogRecord is sufficient.

Usually we construct formatters with the formatting.format function, which takes template strings and produces a string.

We can provide something like:

formatting.format`${formatting.json}`

That produces a formatter function. The json symbol simply means the entire data structure of the LogRecord is captured.

The other issue is the schema of the keys of the JSON record, we should align it with some standard. Possibly close to opentracing standard or structlog standard.

Issues Fixed

Fixes #3

Tasks

[x] 1. Introduced jsonFormatter as a default JSON LogFormatter
[x] 2. Changed symbol trace to stack to avoid confusion with tracing in #15
[x] 3. Allowed stack and keys to be lazily created if the renderer calls them
[x] 4. Enabled user to submit LogData for arbitrary data to fit into the log
[x] 5. The LogData enables lazy values by wrapping values in a lambda
[x] 6. Log level is now checked by each logger in the hierarchy, so their level is applied to filter any logs
[x] 7. Introduced jest snapshots for testing the JSON formatter https://jestjs.io/docs/snapshot-testing, the snapshots are committed to source
[x] 8. Added jest.useRealTimers(); to close off the timer mocking in tests
[x] 9. The msg is no longer mandatory, if is not passed, the msg will be empty

Final checklist

[x] Domain specific tests
[x] Full tests
[x] Updated inline-comment documentation
[x] Lint fixed
[x] Squash and rebased
[x] Sanity check the final build

ghost commented 2 years ago

👇 Click on the image for a new way to code review

- Make big changes easier — review code in small groups of related files - Know where to start — see the whole change at a glance - Take a code tour — explore the change with an interactive tour - Make comments and review — all fully sync’ed with github [Try it now!](https://app.codesee.io/r/reviews?pr=14&src=https%3A%2F%2Fgithub.com%2FMatrixAI%2Fjs-logger)

Legend

CodeSee Map Legend

CMCDragonkai commented 2 years ago

So the issue is that log data is actually allowed to be an arbitrary object, whereas here the info... etc, all take ToString as the type.

It converts them to strings first.

Structured logging benefits from potentially arbitrary keys, and to allow nested structure, not just strings.

This means, the Logger.info and related functions should be taking a structured object as data instead of just ToString. It can still take a string of course if that's what is needed, but anything else will instead become a structured object.

CMCDragonkai commented 2 years ago

Ok let's not do formatting.json it should be possible to do just JSON.stringify as the formatter. We can just special case it.

However as for the schema, the GELF can work: https://docs.graylog.org/docs/gelf#gelf-payload-specification. Not sure about other ones.

Now the only thing left is the fact that data structures should be usable with logger.info as well.

The structlog takes a string as the message, and additional arbitrary fields.

With this concept, it's also possible to build a context for further logging. And that just means a prefilled dictionary. Somewhat similar to creating child loggers as we do right now.

https://github.com/bitc/structured-logging-schema#message-and-payload

So it does seem all we need to do is to enable the usage of a payload structure.

logger.info('some message', { some: 'structure' }, JSON.stringify)

Then we have a default structure associated to the log... think of it as "global" context, like time and such. But then further context can be created for a subset of loggers...

And "canonical logging" means that the log gets built up and then output at the end. However that doesn't cover a span. You want to know when it started and stopped too.

It's possible that we combine the structure with the message too, but that might a bit messy. It really depends on whether a log always requires a message or if it can be optional. For example:

logger.info({ message: '...' }, JSON.stringify);

Then a default key can be mandated.

CMCDragonkai commented 2 years ago

Another example https://www.elastic.co/guide/en/ecs/current/ecs-reference.html. In this case, it does seem that special casing a message is the wrong way to go, if an object is used... you just need to specify your own schema like msg: 'hello world'.

CMCDragonkai commented 2 years ago

Won't build in any special schemas. This can be dictated custom by users. PK application can choose to use something custom. Both for formatter and the structured data.

Contexts for spans and traces can be done through having a prefixed object context. This can also be added to a child logger during construction.

CMCDragonkai commented 2 years ago

Well this was quite easy. All we needed to do is to add:

const jsonFormatter: LogFormatter = (record: LogRecord) => {
  return JSON.stringify({
    key: record.key,
    date: record.date,
    msg: record.msg,
    level: record.level
  });
};

Then when constructing logger:

    const logger = new Logger('root', LogLevel.NOTSET, [
      new ConsoleOutHandler(
        formatting.jsonFormatter
      ),
    ]);

    logger.debug('DEBUG MESSAGE');
    logger.info('INFO MESSAGE');
    logger.warn('WARN MESSAGE');
    logger.error('ERROR MESSAGE');

And that's it, you get JSON records being output now.

CMCDragonkai commented 2 years ago

The main issue now is schema standardisation. There's no real one single standard for logging, common ones include graylog GELF, the ECS fields above. Libraries like structlog don't really specify any standards for the payload. OpenTelemetry does have a standard for their properties.

There are some "base"/core properties that these libraries tend to all use:

level - we use numbers from 1 DEBUG, 2 INFO,... etc.
msg or message - msg seems more popular, although I'm not sure if if is actually "necessary"
time or timestamp - time seems more popular - always uses ISO8601 format

We have these pieces of information as well in the LogRecord.

To set a base-schema, we can use follow ECS's core properties:

https://www.elastic.co/guide/en/ecs/current/ecs-guidelines.html

https://www.elastic.co/guide/en/ecs/current/ecs-base.html

https://github.com/elastic/ecs/blob/main/generated/csv/fields.csv

These are the base fields for ECS:

@timestamp
message
labels
tags

The log level is placed in a subobject: https://www.elastic.co/guide/en/ecs/current/ecs-log.html, intended to be a keyword, not a number.

CMCDragonkai commented 2 years ago

Hmm to make this elegant, we need to do some changes:

The makeRecord should create a lazy dictionary, as in properties are functions so they can instead be called on-demand when they are required.
This enables a formatter with special symbols applied to call the relevant function to acquire the value. So the formatter dictates the control flow, and this is set on the handlers.
I noticed that callHandlers propagate to parent callHandlers, but the level check isn't done here, it's done on the Logger.log method, and this means parent log levels don't filter out any child log records.
Notice that makeRecord is called by log, which means we should not recreate a log record upon going to the parent. Instead the same record should be kept.
Let's suppose level check occurs in callHandlers, while log precreates the LogRecord, and the LogRecord is just a bunch of functions. Then it should work well.
The format function which takes symbols, can then call functions against the LogRecord. Thing's like

CMCDragonkai commented 2 years ago

It appears open telemetry considers logging out to stderr as the telemetry/trace exporter to be used only for "debugging". Production exporters all use TCP, GRPC or HTTP 1.1 to send traces to a centralised trace collector like zipkin or whatever.

However this seems to be against how traditional orchestration is supposed to work, where this data goes to STDERR, and then collected by the orchestrator to be sent to centralised loggers. Why did the open telemetry people decide to create their own exporters? Was it to get around STDERR, was it because STDERR logging is considered orthogonal?

It's all done via http or grpc. As if networked sends is better than stderr. Maybe interleaving with stderr isn't right. However for decentralised systems, does it make sense to assume that there is a collector somewhere else? Obviously we would say that the CLI/GUI is a viable collector in order to debug the system. Whereas our hosted testnet may have be pointed somewhere else.

It does seem that stderr/stdout is being deprecated for more "network-centric" observability. I guess it provides more flexibility.

CMCDragonkai commented 2 years ago

Otel is way too complex. Here's apparently the "basic" example of using Otel and exporting directly to the console: https://github.com/open-telemetry/opentelemetry-js/blob/main/examples/basic-tracer-node/index.js.

Why does it require so many different packages?

const opentelemetry = require('@opentelemetry/api');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const { BasicTracerProvider, ConsoleSpanExporter, SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');

It seems over-engineered. The spec however might be useful.

CMCDragonkai commented 2 years ago

For most language specific instrumentation libraries you have exporters for popular backends and OTLP. You might wonder,

under what circumstances does one use a collector to send data, as opposed to having each service send directly to the backend?

For trying out and getting started with OpenTelemetry, sending your data directly to a backend is a great way to get value quickly. Also, in a development or small-scale environment you can get decent results without a collector.

However, in general we recommend using a collector alongside your service, since it allows your service to offload data quickly and the collector can take care of additional handling like retries, batching, encryption or even sensitive data filtering.

So in-process sending is only for adoption as I suspected.

However the community has obviously selected against relying on STDERR, because they view traces not as human readable information, but as information intended for machines to parse, and thus did not bother writing to STDERR, except as an afterthought. This makes sense, because for them once the span-data is written to stderr what happens to it afterwards? It's sort of useless as textual information (not really but one can understand). So there's nothing built to "forward" the stderr logs to a trace receiver or OTLP collector. So even with the OTLP collector it's all designed to send logs over a network connection rather than relying on simple STDERR.

I think what I'll do is examine the spec of opentracing with respect to the log data itself, and then just add it to our existing structured logging, which continues to be send to STDERR. For future sending to other places, that could be done via custom handlers.

For viewing and debugging the traces, if we maintain compatibility with the spec, it should be possible to send the stderr logs directly into a trace collector via HTTP 1.1: https://opentelemetry.io/docs/concepts/sdk-configuration/otlp-exporter-configuration/. Also see https://github.com/opentracing-contrib/java-span-reporter

CMCDragonkai commented 2 years ago

An example of manually sending a trace:

Get a Zipkin payload to test. For example create a file called trace.json that contains:

[
  {
    "traceId": "5982fe77008310cc80f1da5e10147519",
    "parentId": "90394f6bcffb5d13",
    "id": "67fae42571535f60",
    "kind": "SERVER",
    "name": "/m/n/2.6.1",
    "timestamp": 1516781775726000,
    "duration": 26000,
    "localEndpoint": {
      "serviceName": "api"
    },
    "remoteEndpoint": {
      "serviceName": "apip"
    },
    "tags": {
      "data.http_response_code": "201"
    }
  }
]

With the Collector running, send this payload to the Collector. For example:

$ curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @trace.json

CMCDragonkai commented 2 years ago

Opentelemetry details are moved into new issue #15.

Because the telemetry schema doesn't actually require a mandatory message, it makes sense to change our logging methods to take a string or object. If string, then that's assumed to be a regular msg, if object, a message is not necessary and can be empty.

It's up to users to provide the necessary message with msg and appropriate logger formatter for their application usecase.

I imagine tracing would be built on top of js-logger, so this library is kept simple. Tracing would be PK-specific.

CMCDragonkai commented 2 years ago

This is now done. Going to update the README.md with some example usages.

MatrixAI / js-logger