Open phvalguima opened 9 months ago
On this, we should have a configurable path. Some snaps disable syslogs, but keep the service logs in some dedicated file.
I think @marcoppenheimer, in this case, we should actually improve: https://github.com/sosreport/sos
Each team should contribute its own version of a plugin: https://github.com/sosreport/sos/tree/main/sos/report/plugins
That tool is a standard for support cases (even integrates natively with support portals, using --case-id
option).
Right now, we can already enable the systemd
and kubernetes
plugins and capture the output of the main host:
$ sudo snap install sosreport --channel=latest/stable --classic
$ mkdir /home/ubuntu/sosreport
$ sudo sos report \
--only-plugins kubernetes,systemd \
--enable-plugins kubernetes \
-k kubernetes.describe=true -k kubernetes.podlogs=true -k kubernetes.all=true \
--batch \
--clean \
--tmp-dir=./sosreport \
-z gzip
Where each option means:
$ sudo sos report \
--only-plugins kubernetes,systemd \ # These are the only plugins to run
--enable-plugins kubernetes \ # Make sure both are enabled, e.g. k8s is disabled by default
-k kubernetes.describe=true -k kubernetes.podlogs=true \ # options to capture pod logs and describe k8s ns
--batch \ # No prompting
--clean \ # Obfuscate data
--tmp-dir=./sosreport \ # where the report output goes
-z gzip # generates a tar.gz output
For LXCs, thou, we need something else. LXD plugin captures container console + LXD logs, but does not capture the journal logs within each container. Therefore, the best way here would be to run the commands above on each container, capture all the logs, and download them to a local folder in the host.
For kubernetes, you need to have kubectl installed and set the ~/.kube/config correctly before running that command.
There is no problem in running plugin kubernetes
and not having k8s at all. It will just not capture any output.
In my opinion, @carlcsaposs-canonical should run the commands above by default, at least whenever a test fails. That should come alongside juju-crashdump.
Then, we can also add more plugins (kafka, opensearch, etc); as well as extra options. But I'd leave it for each team to extend sosreport and then contribute to dp-workflows here.
Wdyt?
Currently, we collect
juju debug-log
specifics, which provides insights into the charm states. We should also collect logs from the actual services, such as systemd logs for VM charms OR k8s pod logs.That will provide insights, e.g. if we had restarting units at a given time, such as discussed in this bug.
One example of how to capture systemd logs: https://github.com/canonical/data-platform-workflows/compare/main...add-more-logs
I think we need an equivalent command for k8s as well.