coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Log format makes it hard to read logs #215

Closed fjetter closed 1 year ago

fjetter commented 1 year ago

The currently configured log format is incredibly difficult to read

This is an example for a log message as retrieved using the coiled clusters log

2022-11-09T22:17:09.692000+00:00 test_dataframe-86e8569e-scheduler-i-07fa12df6628cf4e1 Nov  9 22:17:09 ip-10-0-5-46 cloud-init[881]: 2022-11-09 22:17:09,547 - distributed.scheduler - INFO - Remove worker <WorkerState 'tls://10.0.13.147:43965', name: test_dataframe-86e8569e-worker-7862a1560f, status: closing, memory: 0, processing: 0>

This is something like TS instance_name Date+TS process_name(??): <dask.distributed_log_format>

There are 190 characters before the actual message starts. Some of this is surely valuable information but it should not take 190 chars. If the --short command is provided this drops to 123 chars for the same log message which looks like

2022-11-09T22:17:09 Nov  9 22:17:09 ip-10-0-5-46 cloud-init[881]: 2022-11-09 22:17:09,547 - distributed.scheduler - INFO - Remove worker <WorkerState 'tls://10.0.13.147:43965', name: test_dataframe-86e8569e-worker-7862a1560f, status: closing, memory: 0, processing: 0>

The difference between short and long is only the leading timestamp formatting and the missing instance name.

We receive timestamps three times in different formats. This actually looks like a log message is emitted, formatted, ingested by another logger, formatted again, etc. Three times with three different log formats

  1. Cloudwatch which wraps
  2. ??? which wraps
  3. distributed

Typically, when reading logs, I'm typically writing them to a file and semi-manually remove the leading formatter foo first before I can work with the logs.

As a Coiled user I would like to have the possibility to specify a log format and have a much less verbose default.

fjetter commented 1 year ago

FWIW I'm using the below regex to process my logs (don't judge me, It's awful, I'm not a regex person)

Search (\d{4}-\d{2}-\d{2}T((\d{2}:?)+))\s\w+\s+\d+\s(\d{2}:?)+\s([\w\-\d]+)\s([\w\-\d\[\]]+):\s(\d{4}-\d{2}-\d{2}\s(\d{2}:?)+,\d+\s-\s)? Replace $2 $5\t-

Gives me a bit more concise log

Time IP - distributed_msg

22:17:09 ip-10-0-5-46 - distributed.scheduler - INFO - Remove worker <WorkerState 'tls://10.0.13.147:43965', name: test_dataframe-86e8569e-worker-7862a1560f, status: closing, memory: 0, processing: 0>

shughes-uk commented 1 year ago

@ntabris make the logs look nicer so i'm assuming you're happy @fjetter , reopen if not