OpenSourceCommerce / log-explorer

5 stars 7 forks source link

Create syslog receiver to run as daemon process #238

Open ThomasLohner opened 2 years ago

ThomasLohner commented 2 years ago

Logexplorer stores data in clickhouse database. Sending messages to Logexplorer is very easy via the REST API but this does not scale very well. Syslog protocol scales much better and is non-blocking when used in udp instead of tcp. This is perfect for applications so they don't suffer from an outage of Logexplorer.

We will create a syslog receiver that buffers messages and writes them in bulk inserts into clickhouse. Checkout the syslog protcol description here: https://datatracker.ietf.org/doc/html/rfc5424

The daemon should be implemented in Swoole (https://openswoole.com) for maximum performance.

The TAG in the syslog message matches the table name in clickhouse. Messages can either be json, or a string which is then parsed via a GROK pattern. For this we need to find a php implementation of GROK patterns.

Config per TAG / Table:

tag: <name of clickhouse table>

type: json or grok

pattern (only if type=grok): <grok pattern>
ThomasLohner commented 2 years ago

I have tested this ticket and i think we need some more improvements before this is production ready:

Change syslog pattern In rfc 5424 there is STRUCTURED-DATA, so we must assume that some syslog-clients will send this. New pattern should be:

<%{POSINT:pri}>%{POSINT:version} %{TIMESTAMP_ISO8601:timestamp} %{HOSTNAME:hostname} %{USERNAME:table_name} %{USERNAME:proc_id} %{USERNAME:app_name} (\[%{DATA:structured_data}\]|\-) %{GREEDYDATA:message}

Invalid JSON crashes server process If an invalid JSON is sent the server process just dies. To reproduce:

docker-compose exec php logger --udp --server api --port 9506 --tag {table_name} '{"text":"hello", FOO}'

Ignore unkown fileds in message If the message contains less fields than the clickhouse table it will still be written to the table. This is correct behavior. But if the message contains fields that are missing in the table then nothing is written to clickhouse. To fix this, we need to load table structure on server start and compare this to the message before executing clickhouse query.

Make Timestamp optional If there is no timestamp in the data part of the message then use timestamp from syslog header.

Use Tag or App-Name for table detection Some applications like nginx don't allow to set a custom syslog tag but they will send a syslog app-name. So we need to first check for syslog tag and if this is empty then use app-name to extract the table name from the message.

Verbose logging Add a config option to have some verbose logging to stdout. It's much easier to debug if you can see which message string was received and if parsing has worked (grok or json)