atviriduomenys / spinta

Spinta is a framework to describe, extract and publish data (a DEP Framework).
MIT License
10 stars 4 forks source link

Split logs into smaller files #506

Open sirex opened 10 months ago

sirex commented 10 months ago

Currently Spinta logs are stored in a single large file. There is a script (#298) that reads log files, but it reads whole file content into memory and ends up consuming all of RAM memory.

To fix that, we need another script, that would split large log file info smaller files by date.

I think we can split logs daily into accesslog-%Y-%m-%d.jsonl files.

So this script could work like this:

scripts/split-logs.py path/to/logfile.jsonl -f "accesslog-%Y-%m-%d.jsonl"

Where -f is format of files to split logs into.

In addition to this script, we also need to fix Spinta, to automatically save log entries into separate files, by given formatted file name string in configuration. When new log entry ends up with a different file name, a new log file should be created and new log entries should be stored into that new file.

Log file swap should happen online, without restarting server, that means when logging, a previous log file name should be stored and compared with new log file name and if file name changed, then new file should be opened.

For file name formatting use strftime() function.

Example from log file, each new line in log file contains following JSON data, in the example below, JSON is pretty printed, but whole this JSON object is stored as single line.

{
  "agent": "Mozilla/5.0 (compatible)",
  "client": "default",
  "format": "html",
  "action": "getone",
  "method": "GET",
  "url": "http://get.data.gov.lt/datasets/gov/lst/standards/StandardDocument/4294130a-c922-4b49-a8e1-6be61d83abc5",
  "time": "2021-11-11T13:50:23.866065+00:00",
  "model": "datasets/gov/lst/standards/StandardDocument",
  "id": "4294130a-c922-4b49-a8e1-6be61d83abc5"
}

When splitting logs into multiple files, use time field.

In addition, it would be nice, to upload log files into ELK Stack, but this is probably another task.

Related