Closed PedroBuhigas closed 1 year ago
@nlcamp is on-call - please help take a look. Upon first glance it looks like this old issue https://github.com/Azure/iotedge/issues/4011 but I think that one has to do with permissions on the sockets, which doesn't seem like the case here?
@PedroBuhigas - Can you provide debug-level logs from the IoT Edge daemons (edged, keyd, certd) by following these steps: https://learn.microsoft.com/en-us/azure/iot-edge/troubleshoot?view=iotedge-1.4#check-the-status-of-the-iot-edge-security-manager-and-its-logs
Then please run iotedge system logs
and provide the output for us to analyze.
Also, what version of edgeAgent and edgeHub are you using in your config.toml and deployment?
Also, please share the output of the following commands:
Our deployment manifest uses version 1.4 of the agent and hub.
system_logs.log system_logs_2.log ls1.txt ls2.txt
See attached logs and output of ls commands.
@nlcamp please take a look when you're back
@PedroBuhigas - It looks like the following code is failing on startup of the edgAgent container because the user does not have permissions to change ownership of the socket file, however I'm not sure how it got into that state: https://github.com/Azure/iotedge/blob/b4c7dc64765c1b33aa563ff0e7a317516b9fce00/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Service/agentStart.sh#L77
A few additional questions for you:
What do you mean in your description when you say "Using UID 1000 for edgeAgent and edgeHub."?
Can you run the following commands on the device and share the output?
a. id edgeagentuser
b. docker exec edgeAgent id
c. docker exec edgeAgent id edgeagentuser
What platform (OS and Arch) are you using?
@PedroBuhigas - It looks like the following code is failing on startup of the edgAgent container because the user does not have permissions to change ownership of the socket file, however I'm not sure how it got into that state:
A few additional questions for you:
- What do you mean in your description when you say "Using UID 1000 for edgeAgent and edgeHub."?
- Can you run the following commands on the device and share the output? a.
id edgeagentuser
b.docker exec edgeAgent id
c.docker exec edgeAgent id edgeagentuser
- What platform (OS and Arch) are you using?
The deployment settings sets environment variable EDGEAGENTUSER_ID to 1000
root@eve5044:/opt/NinjaRMMAgent/programfiles# id edgeagentuser uid=13622(edgeagentuser) gid=994(iotedge) groups=994(iotedge)
root@eve5044:/opt/NinjaRMMAgent/programfiles# docker exec edgeAgent id Error response from daemon: Container 9133f7d36ff72f655a723e8cc29ab56cef37bae0ca30eebe0056c4ead22dbeac is not running <= Because of the error the module never starts. Same issue for the next command
We are using ubuntu 20.04 x64.
The deployment settings sets environment variable EDGEAGENTUSER_ID to 1000
Why are you setting this? Have you tried removing it, then removing the existing edgeAgent container via docker rm edgeAgent
and then see if that iotedge system restart
resolves the issue?
I need to investigate some more, but I think in the past setting EDGEAGENTUSER_ID to 1000 in the deployment was needed, but I don't think it is anymore.
Setting the UID due to legacy host storage configuration. Would need to update host storage folders permissions on hundreds of devices.
On Aug 23, 2023, at 3:57 PM, Noel Campbell @.***> wrote:
The deployment settings sets environment variable EDGEAGENTUSER_ID to 1000
Why are you setting this? Have you tried removing it, then removing the existing edgeAgent container via docker rm edgeAgent and then see if that iotedge system restart resolves the issue?
I need to investigate some more, but I think in the past setting EDGEAGENTUSER_ID to 1000 in the deployment was needed, but I don't think it is anymore.
— Reply to this email directly, view it on GitHubhttps://github.com/Azure/iotedge/issues/7078#issuecomment-1690628306, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD3WJ635M324NMA5DWSOWE3XWZVCPANCNFSM6AAAAAA3IZHHME. You are receiving this because you were mentioned.Message ID: @.***>
2. id edgeagentuser
I've removed the EDGEAGENTUSER_ID setting. Cleared the docker containers and have the same outcome:
root@eve5044:/etc/iotedge# id edgeagentuser uid=13622(edgeagentuser) gid=994(iotedge) groups=994(iotedge)
@PedroBuhigas - I actually think the line number I gave you before is wrong and that instead, the following line is causing the error: https://github.com/Azure/iotedge/blob/b4c7dc64765c1b33aa563ff0e7a317516b9fce00/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Service/agentStart.sh#L93
With that said, it seems that the su
command is for some reason unable to set the group id when attempting to switch from the current user to edgeagentuser
. To help debug this, could you answer a few more questions and run few more commands?
sudo journalctl -u apparmor
to see if any errors show up that might indicate the cause. If you do, you can try temporarily disabling apparmor to see if the iotedge issues resolve.sudo docker run -it --entrypoint /bin/sh mcr.microsoft.com/azureiotedge-agent:1.4.16
. Then from within the container, run the following commands:
b. id
c. ls -l $(which su)
d. ls -l /bin/busybox
@.***:/opt/NinjaRMMAgent/programfiles# journalctl -u apparmor -- Logs begin at Thu 2023-08-03 05:07:21 UTC, end at Mon 2023-08-28 16:00:03 UTC. -- -- No entries –
/app # id uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video) /app # ls -l $(which su) lrwxrwxrwx 1 root root 12 Jun 14 15:03 /bin/su -> /bin/busybox /app # ls -l /bin/busybox -rwxr-xr-x 1 root root 816888 Jun 2 00:42 /bin/busybox /app #
From: Noel Campbell @.> Sent: Monday, August 28, 2023 10:49 AM To: Azure/iotedge @.> Cc: Pedro Buhigas @.>; Mention @.> Subject: Re: [Azure/iotedge] edgeAgent su: can't set groups: Permission denied (Issue #7078)
@PedroBuhigashttps://github.com/PedroBuhigas - I actually think the line number I gave you before is wrong and that instead, the following line is causing the error: https://github.com/Azure/iotedge/blob/b4c7dc64765c1b33aa563ff0e7a317516b9fce00/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Service/agentStart.sh#L93
With that said, it seems that the su command is for some reason unable to set the group id when attempting to switch from the current user to edgeagentuser. To help debug this, could you answer a few more questions and run few more commands?
— Reply to this email directly, view it on GitHubhttps://github.com/Azure/iotedge/issues/7078#issuecomment-1695931692, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD3WJ62RJDM7KEWJA3FL22DXXS4XRANCNFSM6AAAAAA3IZHHME. You are receiving this because you were mentioned.Message ID: @.**@.>>
Ok, that output is what I expect to see so nothing looks off there. Would you mind sharing the edgeAgent logs from after you removed the EDGEAGENTUSER_ID env var from your setting? I just want to see if they're any different than the original logs you posted. To grab the logs, please run:
sudo docker logs edgeAgent
No changes:
2023-08-28 17:08:55 Changing ownership of storage folder: /tmp/edgeAgent to 13622 2023-08-28 17:08:55 Changing ownership of backup folder: /tmp/edgeAgent_backup to 13622 2023-08-28 17:08:55 Changing ownership of management socket: /var/run/iotedge/mgmt.sock 2023-08-28 17:08:55 Completed necessary setup. Starting Edge Agent. su: can't set groups: Permission denied 2023-08-28 17:09:55 Starting Edge Agent 2023-08-28 17:09:55 Changing ownership of storage folder: /tmp/edgeAgent to 13622 2023-08-28 17:09:55 Changing ownership of backup folder: /tmp/edgeAgent_backup to 13622 2023-08-28 17:09:55 Changing ownership of management socket: /var/run/iotedge/mgmt.sock 2023-08-28 17:09:55 Completed necessary setup. Starting Edge Agent. su: can't set groups: Permission denied 2023-08-28 17:10:55 Starting Edge Agent 2023-08-28 17:10:55 Changing ownership of storage folder: /tmp/edgeAgent to 13622 2023-08-28 17:10:55 Changing ownership of backup folder: /tmp/edgeAgent_backup to 13622 2023-08-28 17:10:55 Changing ownership of management socket: /var/run/iotedge/mgmt.sock 2023-08-28 17:10:55 Completed necessary setup. Starting Edge Agent. su: can't set groups: Permission denied 2023-08-28 17:11:55 Starting Edge Agent 2023-08-28 17:11:55 Changing ownership of storage folder: /tmp/edgeAgent to 13622 2023-08-28 17:11:55 Changing ownership of backup folder: /tmp/edgeAgent_backup to 13622 2023-08-28 17:11:55 Changing ownership of management socket: /var/run/iotedge/mgmt.sock 2023-08-28 17:11:55 Completed necessary setup. Starting Edge Agent. su: can't set groups: Permission denied 2023-08-28 17:12:55 Starting Edge Agent 2023-08-28 17:12:55 Changing ownership of storage folder: /tmp/edgeAgent to 13622 2023-08-28 17:12:55 Changing ownership of backup folder: /tmp/edgeAgent_backup to 13622 2023-08-28 17:12:55 Changing ownership of management socket: /var/run/iotedge/mgmt.sock 2023-08-28 17:12:55 Completed necessary setup. Starting Edge Agent. su: can't set groups: Permission denied
From: Noel Campbell @.> Sent: Monday, August 28, 2023 11:57 AM To: Azure/iotedge @.> Cc: Pedro Buhigas @.>; Mention @.> Subject: Re: [Azure/iotedge] edgeAgent su: can't set groups: Permission denied (Issue #7078)
Ok, that output is what I expect to see so nothing looks off there. Would you mind sharing the edgeAgent logs from after you removed the EDGEAGENTUSER_ID env var from your setting? I just want to see if they're any different than the original logs you posted. To grab the logs, please run:
sudo docker logs edgeAgent
— Reply to this email directly, view it on GitHubhttps://github.com/Azure/iotedge/issues/7078#issuecomment-1696030424, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD3WJ66IH754HJC3NV7RCI3XXTEU7ANCNFSM6AAAAAA3IZHHME. You are receiving this because you were mentioned.Message ID: @.**@.>>
Hmm. I've tried to reproduce this w/ Ubuntu 20.04 and iotedge v1.4.16, but I'm not seeing the issue. It works for me with the default EDGEAGENTUSER_ID (i.e. when I don't specify it in my deployment) and also when I set it to 1000.
Is it possible you've configured edgeAgent to start with a user other than root? I believe this can be done via the User
field in createOptions. Can you share the createOptions in your config.toml as well as your deployment to see if that's the cause?
I think it’s something that happened on an upgrade. I have a few hundred devices with the same deployment process with no issues – this is the only one having this problem. I’ve tried unsinstalling / reinstalling to no avail
From: Noel Campbell @.> Sent: Monday, August 28, 2023 1:07 PM To: Azure/iotedge @.> Cc: Pedro Buhigas @.>; Mention @.> Subject: Re: [Azure/iotedge] edgeAgent su: can't set groups: Permission denied (Issue #7078)
Hmm. I've tried to reproduce this w/ Ubuntu 20.04 and iotedge v1.4.16, but I'm not seeing the issue. It works for me with the default EDGEAGENTUSER_ID (i.e. when I don't specify it in my deployment) and also when I set it to 1000.
Is it possible you've configured edgeAgent to start with a user other than root? I believe this can be done via the User field in createOptions. Can you share the createOptions in your config.toml as well as your deployment to see if that's the cause?
— Reply to this email directly, view it on GitHubhttps://github.com/Azure/iotedge/issues/7078#issuecomment-1696121615, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD3WJ67WVGFURL347OCT2FLXXTM3TANCNFSM6AAAAAA3IZHHME. You are receiving this because you were mentioned.Message ID: @.**@.>>
I’ve tried unsinstalling / reinstalling to no avail
@PedroBuhigas - When you uninstall aziot-edge and aziot-identity-service, try using purge
instead of remove
so that all files and dirs associated with the packages are deleted: sudo apt-get purge aziot-edge aziot-identity-service
.
@PedroBuhigas any updates?
Same issue.
I purged deviceupdate-agent, deliveryoptimization-agent, defender-iot-micro-agent-edge, aziot-edge aziot-identity-service. Reinsatlled aziot-edge and getting same results.
@.***:~# iotedge logs edgeAgent 2023-09-01 16:46:28 Starting Edge Agent 2023-09-01 16:46:28 Creating UID 13622 as edgeagentuser 2023-09-01 16:46:28 Creating storage folder: /tmp/edgeAgent 2023-09-01 16:46:28 Creating backup folder: /tmp/edgeAgent_backup 2023-09-01 16:46:28 Changing ownership of management socket: /var/run/iotedge/mgmt.sock 2023-09-01 16:46:28 Completed necessary setup. Starting Edge Agent. su: can't set groups: Permission denied
When removing aziot-edge I get the following log entry: userdel: group iotedge is the primary group of another user and is not removed.
Manually removed group and reinstalled… same outcome.
From: John Lian @.> Sent: Thursday, August 31, 2023 1:17 PM To: Azure/iotedge @.> Cc: Pedro Buhigas @.>; Mention @.> Subject: Re: [Azure/iotedge] edgeAgent su: can't set groups: Permission denied (Issue #7078)
@PedroBuhigashttps://github.com/PedroBuhigas any updates?
— Reply to this email directly, view it on GitHubhttps://github.com/Azure/iotedge/issues/7078#issuecomment-1701543401, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD3WJ65KCGGO42DHV4CSCL3XYDIKNANCNFSM6AAAAAA3IZHHME. You are receiving this because you were mentioned.Message ID: @.**@.>>
When removing aziot-edge I get the following log entry: userdel: group iotedge is the primary group of another user and is not removed.
Manually removed group and reinstalled… same outcome.
You may need to completely remove the user whose group uses iotedge as the primary group. Then do a reinstall (w/ purge) and see if the results are any different.
That’s not feasible, as the user is the primary admin of the device. Could you tell me which permissions are actually trying to be set and we can try to set those up manually?
From: Noel Campbell @.> Sent: Friday, September 1, 2023 3:59 PM To: Azure/iotedge @.> Cc: Pedro Buhigas @.>; Mention @.> Subject: Re: [Azure/iotedge] edgeAgent su: can't set groups: Permission denied (Issue #7078)
When removing aziot-edge I get the following log entry: userdel: group iotedge is the primary group of another user and is not removed.
Manually removed group and reinstalled… same outcome.
You may need to completely remove the user whose group uses iotedge as the primary group. Then do a reinstall (w/ purge) and see if the results are any different.
— Reply to this email directly, view it on GitHubhttps://github.com/Azure/iotedge/issues/7078#issuecomment-1703313217, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD3WJ62ANYGJPULKLCWJOWLXYJEBRANCNFSM6AAAAAA3IZHHME. You are receiving this because you were mentioned.Message ID: @.**@.>>
@PedroBuhigas - Sorry for the delay in responding. When you say you manually removed the group and reinstalled, do you mean you ran: sudo groupdel iotedge
? If not, can you try that after the purge and then reinstall?
To answer your question about the necessary permissions, you'll find them here: https://github.com/Azure/iotedge/blob/b4e7b13342c6896464fbdf706f75ac0e963cf889/edgelet/contrib/debian/postinst
However, I think the userdel error message you see on purge may not be related at all to the edgeAgent permissions error you're seeing as I see it in my repro attempts as well, but I don't run into the same edgeAgent permissions error. There seems to be something else on your host that is preventing the exec su
command from succeeding in the container. I thought it might be apparmor, but your apparmor logs are empty.
At this point, I'm not sure what the root cause is because all of the logs and checks look fine and I'm unable to repro the issue. Since this is happening on just one out of several hundred devices, is it feasible for you to re-flash/image the OS and then reinstall iotedge on the problematic device?
Closing this issue because there has been no response. Feel free to reopen if needed.
Expected Behavior
edgeAgent restarts
Current Behavior
edgeAgent fails to start with the following log entries:
2023-08-08 17:10:33 Starting Edge Agent 2023-08-08 17:10:33 Creating UID 13622 as edgeagentuser 2023-08-08 17:10:33 Creating storage folder: /tmp/edgeAgent 2023-08-08 17:10:33 Creating backup folder: /tmp/edgeAgent_backup 2023-08-08 17:10:33 Changing ownership of management socket: /var/run/iotedge/mgmt.sock 2023-08-08 17:10:33 Completed necessary setup. Starting Edge Agent. su: can't set groups: Permission denied
Steps to Reproduce
Provide a detailed set of steps to reproduce the bug.
Output of
iotedge check
Click here
``` Configuration checks (aziot-identity-service) --------------------------------------------- √ keyd configuration is well-formed - OK √ certd configuration is well-formed - OK √ tpmd configuration is well-formed - OK √ identityd configuration is well-formed - OK √ daemon configurations up-to-date with config.toml - OK √ identityd config toml file specifies a valid hostname - OK √ aziot-identity-service package is up-to-date - OK ‼ host time is close to reference time - Warning Could not query NTP server √ preloaded certificates are valid - OK √ keyd is running - OK √ certd is running - OK √ identityd is running - OK √ read all preloaded certificates from the Certificates Service - OK √ read all preloaded key pairs from the Keys Service - OK √ check all EST server URLs utilize HTTPS - OK √ ensure all preloaded certificates match preloaded private keys with the same ID - OK Connectivity checks (aziot-identity-service) -------------------------------------------- ‼ host can connect to and perform TLS handshake with iothub AMQP port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. ‼ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. ‼ host can connect to and perform TLS handshake with iothub MQTT port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. √ host can connect to and perform TLS handshake with DPS endpoint - OK Configuration checks -------------------- √ aziot-edged configuration is well-formed - OK √ configuration up-to-date with config.toml - OK √ container engine is installed and functional - OK × configuration has correct URIs for daemon mgmt endpoint - Error Unable to find image 'mcr.microsoft.com/azureiotedge-diagnostics:1.4.16' locally 1.4.16: Pulling from azureiotedge-diagnostics 7264a8db6415: Already exists 3e50ca6a03ad: Already exists a17595d9604f: Already exists 6f33f8ba42f6: Pulling fs layer 46749a54c878: Pulling fs layer 6f33f8ba42f6: Download complete 6f33f8ba42f6: Pull complete 46749a54c878: Verifying Checksum 46749a54c878: Download complete 46749a54c878: Pull complete Digest: sha256:874026606a4d5f9ca988fad4e279a5b48e62f354aaaf1f3c6f7f0e68c5df2fab Status: Downloaded newer image for mcr.microsoft.com/azureiotedge-diagnostics:1.4.16 One or more errors occurred. (Permission denied) √ aziot-edge package is up-to-date - OK √ container time is close to host time - OK √ DNS server - OK √ production readiness: logs policy - OK ‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning The edgeAgent module is not configured to persist its /tmp/edgeAgent directory on the host filesystem. Data might be lost if the module is deleted or updated. Please see https://aka.ms/iotedge-storage-host for best practices. × production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error Could not check current state of edgeHub container √ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK Connectivity checks ------------------- 24 check(s) succeeded. 5 check(s) raised warnings. Re-run with --verbose for more details. 2 check(s) raised errors. Re-run with --verbose for more details. 7 check(s) were skipped due to errors from other checks. Re-run with --verbose for more details. ```Additional Information
Using UID 1000 for edgeAgent and edgeHub. Not having this issue on all devices, just a couple