datalust / seq-forwarder

Local collection and reliable forwarding of log data to Seq
Apache License 2.0
57 stars 15 forks source link

Ingesting local text log files in real time #58

Closed MattMofDoom closed 6 months ago

MattMofDoom commented 3 years ago

Hi @nblumhardt ,

Is there any chance that seq-forwarder could be extended to include monitoring a folder for text log files being created or updated?

I do note that v2.0.103 notes "Removed the file-based import command - seqcli does this better", but the setup of seq-forwarder as a service with a local internal store would suggest this as an ideal way to continuously import logs from apps that cannot otherwise be updated to send to Seq, without setting up a batched ingest with seqcli.

A use case for me would be that we would ingest application text logfiles in real time - typically rolling files with closed source log libraries - and either detect errors for alerting via Opsgenie, or watch for matching log events between specific times (using my Seq.App.EventTimeout application of course! :-) ) to allow for a time-sensitive alert to be sent. Ideally - the local storage would allow de-duplication of events and handling of changes in the event that the service is not running, once the service has started again.

I have been looking into doing this with file/directory watchers, but I keep coming back to "if only seq-forwarder and seqcli ingest capabilities were combined to handle this" - hence I thought it worth asking the question.

Cheers,

Matt

nblumhardt commented 3 years ago

Hi Matt, thanks for the note!

It definitely feels like there's a feature gap, here, and we'd very much like to tackle it. The problem space is quite big, though, so it's tricky to find some quick "minimum" implementation - unfortunately, I think it'd take somewhat longer than will make it viable for your current needs.

Here is as good a place as any to start sketching it out, though!

It seems like Forwarder and seqcli are destined to grow closer together. Although we're expecting Forwarder to keep evolving independently for now, merging its features into seqcli around the common core of configuration and log-shipping code seems worth investigating.

Here are the gaps in seqcli I can think of, in no particular order, just to get the ball rolling.....

Not sure when there will be a chance to write a spec or map out next steps. Good to start letting the ideas brew, though :-)

MattMofDoom commented 3 years ago

Thanks Nick. This is very much along the lines of my own thinking. I was particular cautious about the file/set fingerprinting and bookmarking. Appreciate it is quite big, hence reaching out - not sure it's one I could tackle alone from scratch in a reasonable time.

Seq itself is very well defined and I've really enjoyed creating/updating apps for it to help us achieve our monitoring and alerting goals !

nblumhardt commented 3 years ago

Thanks Matt, that's great to hear! :-) We'll keep you posted on when/how we approach this.

TheVons commented 2 years ago

Every major log management and analysis system supports monitoring a folder and forwarding the contents. Does SEQ still not support this?

esp0 commented 1 year ago

For information to others wanting this, I received this reply today when asking about this Issue in a support ticket:

It is not on the roadmap at the moment. We are a small team with lots of important things to work on. Unfortunately, it is not possible to do all the things we would like to.

nblumhardt commented 1 year ago

Hi @esp0 - thanks for adding the note here 👍

Just to add some more information to our quick reply over the weekend -

Most log servers rely on one or more of a handful of client-side collectors: Logstash, Fluentd, Vector, etc. and Seq is flexible enough to work with many of these, too, either via its HTTP API or GELF.

We think that in the near future, its unlikely we'll be able to create anything as full-featured as what's already out there.

I'll follow up in a moment with an outline of how to do what you're after using Fluent Bit, which is quick and easy to set up on Windows/macOS/Linux.

nblumhardt commented 1 year ago

(Just realized that I'm off-by one, it's Tuesday - must be that post-NDC jet lag 😅 )

The Fluent Bit 1.9 Windows installers are in:

https://docs.fluentbit.io/manual/installation/windows#installation-packages

Fluent Bit runs as a service, but its easiest to configure/debug from the command-line first using the zip distribution.

To tail one or more log files, you need an [INPUT] in fluent-bit.conf:

[INPUT]
    Name         tail
    Parser       simple
    Path         .\logs\*.log
    DB           .\logs\tail.db
    Read_from_Head  on

Note that on Windows, paths with wildcards like * must use backslash separators. The DB parameter keeps track of what data has been sent so far. You can find all the details you need on the Tail plug-in at:

https://docs.fluentbit.io/manual/pipeline/inputs/tail

My test log format looks like:

2022-10-18T05:44:41.3830613Z Hello, world
2022-10-18T05:44:42.3830613Z Next

I've defined a parser called simple in parsers.conf:

[PARSER]
    Name        simple
    Format      regex
    Regex       ^(?<time>[^ ]+) (?<message>.+)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L

If you check out parsers.conf, there are a stack of examples in there. Any fields you extract (via regex) will flow through as properties on your log events in Seq.

Back in fluent-bit.conf, the remaining configuration is:

[FILTER]
    Name    modify
    Match   *
    Rename  message @m

[OUTPUT]
    Name             http
    Match            *
    Host             localhost
    Port             5341
    URI              /api/events/raw?clef
    header           X-Seq-ApiKey yQDwNvt1KdwM8N6SkgqR
    Format           json_lines
    Json_date_key    @t
    Json_date_format iso8601
    log_response_payload False

This is generating the CLEF format and posting newline-delimited JSON batches to Seq's ingestion endpoint.

The header arg shows how you can attach an API key to the ingestion requests, if you choose.

The only really important field in the CLEF payload is @t, but other fields are defined for messages (@m), error details (@x), and a couple more. I'm using a [FILTER] just ahead of the [OUTPUT] block to rename message (extracted using the regex) to @m.

Here's the result in Seq:

image

As more events are written they're shipped to Seq.

There's a lot of info out there showing different scenarios with Fluent Bit so hopefully most formats/setups will be described somewhere already, but if you run into any trouble please drop us a line! It'd be great to hear how you go.

Best regards, Nick

nblumhardt commented 1 year ago

Hey folks! I've now written this up in long form - we'll likely include it in the Seq docs sometime soon. The post is at: https://blog.datalust.co/tailing-a-log-file-or-folder-with-fluent-bit-and-seq/

Is there anywhere that solution falls short? Fluent Bit is a pretty flexible system that seems to have all of the bases covered.

Still interested in exploring a seqcli option down the track, either way.