fluent-plugins-nursery / fluent-plugin-systemd

This is a fluentd input plugin. It reads logs from the systemd journal.
Apache License 2.0
153 stars 43 forks source link

Problem Understanding Filters #56

Closed Jitsusama closed 6 years ago

Jitsusama commented 6 years ago

Hello.

First off, thanks for this awesome plugin. It has me quite excited! That said, I'm totally ignorant about almost everything Ruby, so I'm having a hard time understanding how to construct filters.

I would like to filter everything that contains a _SYSTEMD_UNIT value of docker.service, as well as everything that contains any value in CONTAINER_NAME.

I understand how to do the first as per this site's example. I looked at the ruby docs you have a link to in the README, but it didn't help me understand how to chain multiple filters together, nor how to have any filter with a wildcarded value.

Any help would be appreciated.

errm commented 6 years ago

Hi,

Great question, we certainly need to improve the documentation around filtering somewhat...

The interface we have here is just a very thin wrapper over what you would do to filter entries with journalctl...

I made an attempt to explain this better below, by borrowing the examples from the journalctl docs, let me know if they help? And I will work on a properly written page of documentation...

AFACT systemd does not support wildcards in these filters ... so to get everything that contains any value in CONTAINER_NAME you would probably want to read all the messages from the journal, then filter things down futher e.g. with a grep filter in fluentd https://docs.fluentd.org/v1.0/articles/filter_grep


https://www.freedesktop.org/software/systemd/man/journalctl.html

Without arguments, all collected logs are shown unfiltered:

journalctl

This is the default if you don't specify any filters in the config

filters []

With one match specified, all entries with a field matching the expression are shown:

journalctl _SYSTEMD_UNIT=avahi-daemon.service

filters [{"_SYSTEMD_UNIT": "avahi-daemon.service"}]

If two different fields are matched, only entries matching both expressions at the same time are shown:

journalctl _SYSTEMD_UNIT=avahi-daemon.service _PID=28097

filters [{"_SYSTEMD_UNIT": "avahi-daemon.service", "_PID": 28097}]

If two matches refer to the same field, all entries matching either expression are shown:

journalctl _SYSTEMD_UNIT=avahi-daemon.service _SYSTEMD_UNIT=dbus.service

Fields with Arrays as values are treated as an OR statement, since a ruby hash can only have one value per key.

filters [{"_SYSTEMD_UNIT": ["avahi-daemon.service", "dbus.service"]}]

This could also be expressed as two separate filter hashes...

filters [{"_SYSTEMD_UNIT": "avahi-daemon.service"}, {"_SYSTEMD_UNIT": "dbus.service"}]

The form you choose only matters if you need to filter on multiple fields

If the separator "+" is used, two expressions may be combined in a logical OR. The following will show all messages from the Avahi service process with the PID 28097 plus all messages from the D-Bus service (from any of its processes):

journalctl _SYSTEMD_UNIT=avahi-daemon.service _PID=28097 + _SYSTEMD_UNIT=dbus.service

filters [{"_SYSTEMD_UNIT": "avahi-daemon.service", "_PID": 28097}, {"_SYSTEMD_UNIT": "dbus.service"}]

Show all logs generated by the D-Bus executable:

journalctl /usr/bin/dbus-daemon

filters [{"_exe": "/usr/bin/dbus-daemon"}]
Jitsusama commented 6 years ago

Thanks so much for the excellent description!

Jitsusama commented 6 years ago

If I was to put in a PR to expand the documentation around this particular topic, which branch should I base it off of? Also, would you like to have documentation branch out into separate files in a /docs directory, or just continue to expand off of README.md?

errm commented 6 years ago

Honestly I haven't thought about it much ... but it is a goal for 1.0 to have better documentation.

/docs on master seems reasonable to start with, I think the README is already a bit too long... so think we should start to split off some topic pages and index them all in the README...

Honestly though if you want to spend some time on this ... do whatever you feel works best...

Jitsusama commented 6 years ago

I'll submit separate PRs to both.

Jitsusama commented 6 years ago

I'm writing up the documentation now, but I found an edge case that you didn't address. Is there any way to specify an OR condition between two expressions instead of the default AND?

IE:

# journalctl _PID=2345 _SYSTEMD_UNIT=docker.service
... <logical AND result here> ...
# journalctl _PID=2345 + _SYSTEMD_UNIT=docker.service
... <logical OR result here> ...
Jitsusama commented 6 years ago

I think one of your examples might actually hit this question.

You gave an example of:

filters [{"_SYSTEMD_UNIT": "avahi-daemon.service", "_PID": 28097}, {"_SYSTEMD_UNIT": "dbus.service"}]

Would the two separate hashes define a logical OR condition @errm? IE: [{"THING1": "value"}, {"THING2": "value"}] would match any logs with THING1=value OR THING2=value?

errm commented 6 years ago

Correct:

Jitsusama commented 6 years ago

There, I've created 2 PRs, one against the master branch and the other cherry-picking commits into the 1.0.0 branch.

errm commented 6 years ago

Sure I wouldn't worry about the v1.0.0 branch I am going to merge it into master just before we release v1 anyway ... people will see the docs on master when they look for them ...

Jitsusama commented 6 years ago

Since my issue has been resolved and master documentation has been merged, I'm happy that this issue is fully dealt with. Thanks again for your help!

errm commented 6 years ago

Thanks for your help :)