Open przemeklal opened 5 months ago
Can you please provide steps for deploying a usable cis_hardened (level2) system with juju?
Hi @przemeklal, I managed to relate grafana agent to a cis hardened system (cis_level2_server) and I did not see this behavior. Could you see if you can reproduce the issue?
Deploying g-agent to a cis-hardened machine is not enough to reproduce it. You should also deploy LXCs on that machine and relate g-agent to the apps running in these containers:
juju deploy ch:ubuntu inner --to lxd:0 --series focal # where 0 is your cis-hardened machine id
juju relate grafana-agent inner
Once g-agents inside these LXDs try to access stuff in /var/log, auditd log spam starts one level below, on the LXD "host". FWIW, LXDs should also be hardened but it would be interesting to see what happens if they're not.
I am still having issues reproducing this nested lxd setup. If someone could reach out and schedule some time to walk me through it that would be very helpful.
The issues seems to be solved by the classically confined snap. Waiting for approval in the snap store. https://forum.snapcraft.io/t/classic-version-for-the-grafana-agent-snap/40378?u=dstathis
Bug Description
On a CIS hardened (level 2) Charmed Openstack control node hosting 25 LXDs running Openstack control plane services, installing and running grafana-agent inside those LXDs caused massive amounts of logs to be written to audit.log on the host level (12G in less than a day, then it just ran out of disk space).
Pretty much all these "new" entries in audit.log are reports of grafana-agent accessing
/var/log/.../*log
files.Typical entries in audit.log look like this one:
Logs from
/var/log/aodh/aodh-evaluator.log
(and all other files logged in audit.log) are searchable in Loki and everything else looks fine. There aren't any related errors being reported in the logs of the grafana-agent running inside the LXD.Additionally, not all files accessed by grafana-agent in the LXDs are reported in audit.log on the host level. The main difference seems to be ownership of log files and directories. For example, I see many logs reporting
/var/log/aodh/*.log
files,/var/log/barbican/.log
files, etc. but nothing about/var/log/juju/*.log
or/var/log/syslog
.Their ownership is as follows:
It seems that as long as files are owned by syslog:adm, grafana-agent's syscalls are not recorded. Accessing files owned by root, barbican (OpenStack service user), hacluster users, results in massive amounts of audit logs.
This may or may not be related to group membership of these user accounts:
This massive audit.log spam may have catastrophic results, for example, if the CIS "4.1.2.3 Ensure system is disabled when audit logs are full" rule is in place, in the worst case it may just shut down the system after running out of space on the /var/log/audit partition.
The issue doesn't occur with filebeat for example, so it might be also related to grafana-agent being a snap.
Is there anything that can be tweaked in grafana-agent snap that could help with this?
Also, my recommendation is to avoid relating grafana-agent to Loki in any CIS-hardened deployments until this is resolved.
To Reproduce
Deploy grafana-agent in any Openstack control-plane LXD container running on a CIS-hardened host, relate it to Loki and watch /var/log/audit/audit.log.
Environment
CIS-hardened Ubuntu 20.04
Charmed Openstack focal/ussuri
Relevant log output
Additional context
This is a potential blocker for grafana-agent deployments on CIS-hardened clouds.