Open rbouterige opened 2 years ago
Hi @rbouterige, I agree with the suggestion. Unfortunately, right now the only convenient way to turn those off is to set the trace level higher (e.g warning) but in that case other moderately important log lines will also be disabled.
I was hoping the log level in logging.rs could be simply lowered to debug, or the log level be a settable parameter.
I was hoping the log level in logging.rs could be simply lowered to debug, or the log level be a settable parameter.
I would very much appreciate the same change
I this continuing to plague my systems.
Looking at journalctl logs on one particular system which had been running since early August, I see the 93% of my log entries are these useless messages.
They are essentially flushing out useful information as we hit the 4GiB limit on the journalctl auto rotate/flush.
Please consider getting the log level turned down on these three messages:
Dec 10 11:47:14 SOLU2001P00004 iotedged[980]: 2021-12-10T17:47:14Z [INFO] - [mgmt] - - - [2021-12-10 17:47:14.098021626 UTC] "GET /modules?api-version=2020-07-07 HTTP/1.1" 200 OK 2680 "-" "-" auth_id(-) Dec 10 11:47:17 SOLU2001P00004 iotedged[980]: 2021-12-10T17:47:17Z [INFO] - Checking edge runtime status Dec 10 11:47:17 SOLU2001P00004 iotedged[980]: 2021-12-10T17:47:17Z [INFO] - Edge runtime is running.
The first is repeated 12 times for each of the pair of the second-third messages.
This also happens in iotedge 1.2.8.
Setting the log level to Error has no affect. The Info log items are still being written to the log. This fills up the /var/log/daemon.log quickly.
Attempting to change the log level:
sudo iotedge system set-log-level error
sudo iotedge system restart
Workaround:
The only solution I have found to stop aziot-edged-service flooding the dameon.log is to edit /lib/systemd/system/aziot-edged.service and then add:
[Service]
StandardOutput=null
systemctl daemon-reload
sudo iotedge system restart
aziot-edged log output:
Mar 14 12:14:39 ev-30000005 aziot-edged[2137]: 2022-03-14T12:14:39Z [INFO] - [mgmt] - - - [2022-03-14 12:14:39.021107881 UTC] "GET /modules?api-version=2020-07-07 HTTP/1.1" 200 OK 3998 "-" "-" auth_id(-) Mar 14 12:14:39 ev-30000005 aziot-edged[2137]: 2022-03-14T12:14:39Z [INFO] - [work] - - - [2022-03-14 12:14:39.060137775 UTC] "POST /modules/%24edgeHub/genid/637685897024976896/decrypt?api-version=2019-01-30 HTTP/1.1" 200 OK 856 "-" "-" auth_id(-) Mar 14 12:14:39 ev-30000005 aziot-edged[2137]: 2022-03-14T12:14:39Z [INFO] - [work] - - - [2022-03-14 12:14:39.134218330 UTC] "POST /modules/%24edgeHub/genid/637685897024976896/encrypt?api-version=2019-01-30 HTTP/1.1" 200 OK 933 "-" "-" auth_id(-) Mar 14 12:14:39 ev-30000005 aziot-edged[2137]: 2022-03-14T12:14:39Z [INFO] - [work] - - - [2022-03-14 12:14:39.208790055 UTC] "POST /modules/%24edgeHub/genid/637685897024976896/decrypt?api-version=2019-01-30 HTTP/1.1" 200 OK 908 "-" "-" auth_id(-) Mar 14 12:14:39 ev-30000005 aziot-edged[2137]: 2022-03-14T12:14:39Z [INFO] - [work] - - - [2022-03-14 12:14:39.271971547 UTC] "POST /modules/%24edgeHub/genid/637685897024976896/encrypt?api-version=2019-01-30 HTTP/1.1" 200 OK 933 "-" "-" auth_id(-) Mar 14 12:14:39 ev-30000005 aziot-edged[2137]: 2022-03-14T12:14:39Z [INFO] - [work] - - - [2022-03-14 12:14:39.337933514 UTC] "POST /modules/%24edgeHub/genid/637685897024976896/decrypt?api-version=2019-01-30 HTTP/1.1" 200 OK 908 "-" "-" auth_id(-) Mar 14 12:14:39 ev-30000005 aziot-edged[2137]: 2022-03-14T12:14:39Z [INFO] - [work] - - - [2022-03-14 12:14:39.401451008 UTC] "POST /modules/%24edgeHub/genid/637685897024976896/encrypt?api-version=2019-01-30 HTTP/1.1" 200 OK 1049 "-" "-" auth_id(-) Mar 14 12:14:39 ev-30000005 aziot-edged[2137]: 2022-03-14T12:14:39Z [INFO] - [work] - - - [2022-03-14 12:14:39.519834529 UTC] "POST /modules/%24edgeHub/genid/637685897024976896/decrypt?api-version=2019-01-30 HTTP/1.1" 200 OK 1028 "-" "-" auth_id(-)
Environment:
Is anybody at Microsoft even seeing this issue? It remains my largest consumer of systemd journal log space. These log messages continue to consume well over 90% of my journal logs and they are meaningless to everybody. Likewise, I have found no documentation on turning the log level down using environment variables, etc. Any help, direction, or pointer to documentation for controlling this would be helpful.
@robinmholt see https://github.com/Azure/iotedge/issues/6730. The iotedge system set-log-level
command creates an override file for the individual services that sets the AZIOT_LOG environment variable to one of error, warn, info, debug, or trace.
@orjanto I would have expected the iotedge system set-log-level error
to cut down on the info you see logged. At least that's the behavior I normally see. Maybe something with Armbian?
This remains a problem in 1.4.27. Are there any plans to reduce the messages to a debug level instead of info?
With the following three changes, the log is a bit quieter, but I have not figured out where the last log messages are coming from.
diff --git a/edgelet/aziot-edged/src/watchdog.rs b/edgelet/aziot-edged/src/watchdog.rs
index 22cf91f61..f8779431f 100644
--- a/edgelet/aziot-edged/src/watchdog.rs
+++ b/edgelet/aziot-edged/src/watchdog.rs
@@ -79,7 +79,7 @@ async fn watchdog(
runtime: &edgelet_docker::DockerModuleRuntime<http_common::Connector>,
identity_client: &aziot_identity_client_async::Client,
) -> Result<(), EdgedError> {
- log::info!("Watchdog checking Edge runtime status");
+ log::debug!("Watchdog checking Edge runtime status");
let agent_name = settings.agent().name();
if let Ok((_, agent_status)) = runtime.get(agent_name).await {
@@ -87,7 +87,7 @@ async fn watchdog(
match agent_status {
edgelet_core::ModuleStatus::Running => {
- log::info!("Edge runtime is running");
+ log::debug!("Edge runtime is running");
}
edgelet_core::ModuleStatus::Stopped | edgelet_core::ModuleStatus::Failed => {
diff --git a/edgelet/edgelet-docker/src/runtime.rs b/edgelet/edgelet-docker/src/runtime.rs
index 8d3ccc5ca..c657a5626 100644
--- a/edgelet/edgelet-docker/src/runtime.rs
+++ b/edgelet/edgelet-docker/src/runtime.rs
@@ -596,7 +596,7 @@ where
}
async fn system_resources(&self) -> anyhow::Result<SystemResources> {
- log::info!("Querying system resources...");
+ log::debug!("Querying system resources...");
let uptime = nix::sys::sysinfo::sysinfo()?.uptime().as_secs();
The log messages I have not figured out yet are:
Dec 21 06:11:58 SOLU424200001 aziot-edged[163971]: 2023-12-21T06:11:58Z [INFO] - <-- GET /modules?api-version=2020-07-07 {"accept": "application/json", "host": "mgmt.sock:80", "connection": "close"}
Dec 21 06:11:58 SOLU424200001 aziot-edged[163971]: 2023-12-21T06:11:58Z [INFO] - --> 200 {"content-type": "application/json"}
I assume they are coming from some common http server connector printing stuff out, but I am at a loss for finding it so far.
We believe we have found it in the iot-identity-service repo's http-common directory.
@rbouterige Take a look at the now-close PR I had. In that, there is a very helpful suggestion which allows you to significantly reduce the level of logging on aziot-edged.service. You can do the equivalent on aziot-identityd.service as well to really quiet the logs significantly.
[Service]
Environment=AZIOT_LOG=edgelet_http_mgmt=WARN,edgelet_http_workload=WARN
This is my adaptation of https://github.com/Azure/iot-identity-service/pull/577#issuecomment-1870688922
After some more testing, I came to the following. With the default log level, my logs rotate every approximately 11 hours. With the IOTEDGE_LOG=WARN when running with the old iotedge (version 1.1) versus the current aziot-edge (version 1.4), I was getting a rotation at greater than 6 months which is what our security team had required of our design.
To attempt to make a less restrictive version of the logging I have created the following files:
# File /etc/systemd/system/aziot-certd.service.d/override.conf
[Service]
Environment=AZIOT_LOG=aziot_certd::http=WARN
# File /etc/systemd/system/aziot-edged.service.d/override.conf
[Service]
Environment=AZIOT_LOG=edgelet_http_mgmt=WARN,edgelet_http_workload=WARN
# File /etc/systemd/system/aziot-identityd.service.d/override.conf
[Service]
Environment=AZIOT_LOG=aziot_identityd::http=WARN
# File /etc/systemd/system/aziot-keyd.service.d/override.conf
[Service]
Environment=AZIOT_LOG=aziot_keyd::http=WARN
This does get me more information than I had before which I do find useful. I have not run very long in this configuration, but will continue to run this way for a while to see where my logs stabilize. I believe that if I can get https://github.com/Azure/iotedge/pull/7181 accepted, my log levels will be nearly identical to what they were before.
I don't know how to interpret rust code into these config settings, I just used strings
on the executables and looked for interesting patterns, then set those until the log messages went away. This is certainly less than ideal, but it is what I am left with.
@robinmholt I looked at your PR and your suggestion to work around this issue, and this is also covering my needs for our devices using iotedge. Thank you so much for your work and sticking through it, I can't say I liked Arnav dry replies on the PR thread. In my apparently narrow-minded mind, an iotedge demon didn't look like the right place to monitor healthy HTTP traffic, I guess we learn something new everyday!
@rbouterige Just got another response from Microsoft on https://github.com/Azure/iotedge/pull/7187#issuecomment-1874733616 which indicates they are not going to adjust the other log messages. Hopefully I can get a similar set of limiting log config levels to quiet these last three messages.
Expected Behavior
iotedge daemon trace at INFO log level should be compact and tidy in 1.1.x
Current Behavior
All HTTP requests traces are logged with the INFO log level, and it should be lower (e.g. DEBUG)
The following example lines are clogging log files, and in my use case are making around 60% of the amount of lines in the syslog file. I am uploading these logs periodically and I found no obvious way to silence these without touching iotedge repository code:
Steps to Reproduce
Provide a detailed set of steps to reproduce the bug.
Context (Environment)
Output of
iotedge check
Click here
``` Configuration checks -------------------- √ config.yaml is well-formed - OK √ config.yaml has well-formed connection string - OK √ container engine is installed and functional - OK √ config.yaml has correct hostname - OK √ config.yaml has correct URIs for daemon mgmt endpoint - OK ‼ latest security daemon - Warning Installed IoT Edge daemon has version 1.1.3 but 1.1.7 is the latest stable version available. Please see https://aka.ms/iotedge-update-runtime for update instructions. √ host time is close to real time - OK √ container time is close to host time - OK ‼ DNS server - Warning Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub. Please see https://aka.ms/iotedge-prod-checklist-dns for best practices. You can ignore this warning if you are setting DNS server per module in the Edge deployment. √ IPv6 network configuration - OK ‼ production readiness: certificates - Warning The Edge device is using self-signed automatically-generated development certificates. They will expire in 88 days (at 2022-01-11 12:49:05 UTC) causing module-to-module and downstream device communication to fail on an active deployment. After the certs have expired, restarting the IoT Edge daemon will trigger it to generate new development certs. Please consider using production certificates instead. See https://aka.ms/iotedge-prod-checklist-certs for best practices. ‼ production readiness: container engine - Warning Device is not using a production-supported container engine (moby-engine). Please see https://aka.ms/iotedge-prod-checklist-moby for details. ‼ production readiness: logs policy - Warning Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. ‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem. Data might be lost if the module is deleted or updated. Please see https://aka.ms/iotedge-storage-host for best practices. ‼ production readiness: Edge Hub's storage directory is persisted on the host filesystem - Warning The edgeHub module is not configured to persist its /tmp/edgeHub directory on the host filesystem. Data might be lost if the module is deleted or updated. Please see https://aka.ms/iotedge-storage-host for best practices. Connectivity checks ------------------- √ host can connect to and perform TLS handshake with DPS endpoint - OK √ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK √ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK √ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK √ container on the default network can connect to IoT Hub AMQP port - OK √ container on the default network can connect to IoT Hub HTTPS / WebSockets port - OK √ container on the default network can connect to IoT Hub MQTT port - OK √ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK √ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK √ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK 18 check(s) succeeded. 7 check(s) raised warnings. Re-run with --verbose for more details. ```Device Information
Runtime Versions
Logs
aziot-edged logs
``` [17:09] root@phycore-stm32mp1-peach:2.14.0:~$tail -f /var/log/syslog 2021-10-13T17:09:09.061727+00:00 phycore-stm32mp1-peach kernel: [ 75.587468] br-1372e2b8beca: port 3(veth39b2a9c) entered disabled state 2021-10-13T17:09:09.061836+00:00 phycore-stm32mp1-peach kernel: [ 75.592120] br-1372e2b8beca: port 3(veth39b2a9c) entered blocking state 2021-10-13T17:09:09.321706+00:00 phycore-stm32mp1-peach kernel: [ 75.846782] br-1372e2b8beca: port 3(veth39b2a9c) entered disabled state 2021-10-13T17:09:09.432766+00:00 phycore-stm32mp1-peach dockerd[541]: time="2021-10-13T17:09:09.429689535Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/eba0f4c788734321087bbeba093fc3a4ba6906115ba06796d14969e9fedfebe6/shim.sock" debug=false pid=1142 2021-10-13T17:09:09.653582+00:00 phycore-stm32mp1-peach systemd[834]: run-docker-runtime\x2drunc-moby-eba0f4c788734321087bbeba093fc3a4ba6906115ba06796d14969e9fedfebe6-runc.saIECF.mount: Succeeded. 2021-10-13T17:09:09.657257+00:00 phycore-stm32mp1-peach systemd[1]: run-docker-runtime\x2drunc-moby-eba0f4c788734321087bbeba093fc3a4ba6906115ba06796d14969e9fedfebe6-runc.saIECF.mount: Succeeded. 2021-10-13T17:09:09+00:00 localhost edgeHub[396]: 2021-10-13 17:09:09.683 +00:00 Edge Hub Main() 2021-10-13T17:09:09.957004+00:00 phycore-stm32mp1-peach avahi-daemon[393]: Joining mDNS multicast group on interface vethaec28b5.IPv6 with address fe80::54be:d2ff:fe32:b47e. 2021-10-13T17:09:09.958099+00:00 phycore-stm32mp1-peach avahi-daemon[393]: New relevant interface vethaec28b5.IPv6 for mDNS. 2021-10-13T17:09:09.959031+00:00 phycore-stm32mp1-peach avahi-daemon[393]: Registering new address record for fe80::54be:d2ff:fe32:b47e on vethaec28b5.*. 2021-10-13T17:09:12.291693+00:00 phycore-stm32mp1-peach kernel: [ 78.820311] eth0: renamed from veth7bd6104 2021-10-13T17:09:12.341710+00:00 phycore-stm32mp1-peach kernel: [ 78.872797] IPv6: ADDRCONF(NETDEV_CHANGE): veth39b2a9c: link becomes ready 2021-10-13T17:09:12.341808+00:00 phycore-stm32mp1-peach kernel: [ 78.873171] br-1372e2b8beca: port 3(veth39b2a9c) entered blocking state 2021-10-13T17:09:12.341829+00:00 phycore-stm32mp1-peach kernel: [ 78.873188] br-1372e2b8beca: port 3(veth39b2a9c) entered forwarding state 2021-10-13T17:09:12.360914+00:00 phycore-stm32mp1-peach NetworkManager[472]:Additional Information
I could not find this logging line in the master branch, so I am assuming this is a solved problem for the 1.2.x releases, but it would be great if something could be done about this in the 1.1 branch too. As it is iotedge is making it harder to read my system trace, and while I understand the need to keep track of HTTP requests, I am not sure it should be recorded at INFO log level. Can anyone confirm this is the intended behavior?
The relevant lines in the repository sources are here