Azure / iot-edge-v1

Azure IoT Edge
http://azure.github.io/iot-edge/
Other
524 stars 258 forks source link

[V2] edgeHub just stopped working? #558

Open Hammatt opened 6 years ago

Hammatt commented 6 years ago

We've been developing an application built on IoT Edge, I just restarted the runtime and the edgeHub module will no longer start up.

docker logs edgeHub
docker :
    + CategoryInfo          : NotSpecified: (:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

Unhandled Exception: System.AggregateException: One or more errors occurred. (Access is denied) --->
Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException: Access is denied
   at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile(Byte[] rawData, String fileName, SafePasswordHandle
password, X509KeyStorageFlags keyStorageFlags)
   at System.Security.Cryptography.X509Certificates.X509Certificate..ctor(String fileName, String password,
X509KeyStorageFlags keyStorageFlags)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Hosting.Initialize(String certPath) in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Hosting.cs:line 23
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.<MainAsync>d__1.MoveNext() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 48
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 27

Edit: I've tried restarting the device. it's running windows 10 iot core if that's of any relevance. I've re-ran the setup command a few times to see if that changed anything and it hasn't.

darobs commented 6 years ago

Hello @Hammatt

Thanks for the information.

Do you know which version of the edgeHub you're running? It will be at the top of the starting banner, Like this:

2018-03-27 19:52:01.404 +00:00 [INF] - Starting Edge Hub
2018-03-27 19:52:01.406 +00:00 [INF] - Version - 1.0.0-preview022.11567621 (12a8e1bb63e619b17ca685efd470ad3f412034f4)
2018-03-27 19:52:01.407 +00:00 [INF] -
        █████╗ ███████╗██╗   ██╗██████╗ ███████╗
       ██╔══██╗╚══███╔╝██║   ██║██╔══██╗██╔════╝
       ███████║  ███╔╝ ██║   ██║██████╔╝█████╗
       ██╔══██║ ███╔╝  ██║   ██║██╔══██╗██╔══╝
       ██║  ██║███████╗╚██████╔╝██║  ██║███████╗
       ╚═╝  ╚═╝╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚══════╝

 ██╗ ██████╗ ████████╗    ███████╗██████╗  ██████╗ ███████╗
 ██║██╔═══██╗╚══██╔══╝    ██╔════╝██╔══██╗██╔════╝ ██╔════╝
 ██║██║   ██║   ██║       █████╗  ██║  ██║██║  ███╗█████╗
 ██║██║   ██║   ██║       ██╔══╝  ██║  ██║██║   ██║██╔══╝
 ██║╚██████╔╝   ██║       ███████╗██████╔╝╚██████╔╝███████╗
 ╚═╝ ╚═════╝    ╚═╝       ╚══════╝╚═════╝  ╚═════╝ ╚══════╝

I'm also curious about the iotedgectl setup command line (minus the connection string), if you're willing to share.

Hammatt commented 6 years ago

Hi @darobs ,

I'm not able to see any logs in the edge hub at all, what I posted in the original post is the entire output of docker logs edgeHub

This is what we're using for the setup command: iotedgectl setup --connection-string "{connection-string}" --auto-cert-gen-force-no-passwords

Edit: possibly relevant to mention, all other modules start up (but time out because they can't connect to the hub)

Hammatt commented 6 years ago

Just some more information, I've been trying to verify ways to reproduce.

So far: This doesn't seem to happen on windows 10 Pro, the module starts up and everything works fine.

But on every Windows 10 IoT Core based device that I've tested this on, the issue occurs.

Hope this is of use.

michael-chi commented 6 years ago

not sure if this is relevant, but I am also seeing edgeHub keep on start/stop. I am running on Raspberry PI 3

Linux raspberrypi 4.9.59-v7+ #1047 SMP Sun Oct 29 12:19:23 GMT 2017 armv7l GNU/Linux

docker logs -f edgeHub results:

Edge Hub Server Certificate File: /mnt/edgehub/edge-hub-server.cert.pfx
Edge Hub CA Server Certificate File: /mnt/edgehub/edge-chain-ca.cert.pem
SSL_CERTIFICATE_PATH=/mnt/edgehub
SSL_CERTIFICATE_NAME=edge-hub-server.cert.pfx
Executing: cp /mnt/edgehub/edge-chain-ca.cert.pem /usr/local/share/ca-certificates/edge-chain-ca.crt
Executing: update-ca-certificates
Updating certificates in /etc/ssl/certs...
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Certificates installed successfully!
runuser: user  does not exist

I couldn't see version information thru docker logs -f edgeHub, below is result of docker logs -f edgeAgent

pi@raspberrypi:~ $ sudo docker logs -f edgeAgent
2018-03-28 04:03:40.721 +00:00 [INF] - Starting module management agent.
2018-03-28 04:03:47.799 +00:00 [INF] - Version - 1.0.0-preview022.11567621 (12a8e1bb63e619b17ca685efd470ad3f412034f4)
2018-03-28 04:03:47.801 +00:00 [INF] -

result of docker images

pi@raspberrypi:~ $ sudo docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
kalschi/rpi-camera-module      0.0.1-arm32v7       9f48fee2c123        About an hour ago   176MB
microsoft/azureiotedge-hub     1.0-preview         d40af83309cd        22 hours ago        234MB
microsoft/azureiotedge-agent   1.0-preview         b7d29616e809        22 hours ago        218MB
kalschi/rpi-camera-module      <none>              53858f83eaaa        2 days ago          230MB
microsoft/azureiotedge-hub     <none>              a679d016e9d2        3 weeks ago         235MB
microsoft/azureiotedge-agent   <none>              8c623975bae5        3 weeks ago         218MB
darobs commented 6 years ago

This seems to be ARM-specific, we will be investigating.

Hammatt commented 6 years ago

Hey @darobs, I can confirm that this is not limited only to ARM as the Windows 10 IoT Core Devices that we have are all x64 architecture. Specifically we've been able to reproduce on a Minnowboard turbot dual Ethernet Quad-Core (Intel Atom E3845) model, and a number of other x64 based devices. I don't have access to any ARM devices to test this on.

darobs commented 6 years ago

Got my problems of the day mixed up... The problem @michael-chi is seeing on the Raspberry Pi has been fixed and should be pushed out to Docker.

@Hammatt - we're still looking at this problem.

Hammatt commented 6 years ago

Thanks @darobs , is there a way that I could roll back to a working version here? It's blocking me pretty hard at work.

darobs commented 6 years ago

Here's what I would try in the following order:

  1. docker rm $(docker ps -aq) and restart

    • There are issues with Docker/Windows restarting and creating a new network with the same name.
  2. Roll back to preview21

    • docker rm $(docker ps -aq) to clean out preview22 images.
    • add --agent microsoft/azureiotedge-agent:1.0.0-preview021 to the iotedgectl setup command, and set the edge hub image in the deployment to microsoft/azureiotedge-hub:1.0.0-preview021

If this is the Windows networking issue we're seeing, the first should fix the problem.

Hammatt commented 6 years ago

That first command isn't working:

PS C:\Data\Users\Administrator\Documents> docker rm $(docker ps -aq)
docker : "docker rm" requires at least 1 argument.
At line:1 char:1
+ docker rm $(docker ps -aq)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: ("docker rm" req...ast 1 argument.:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

See 'docker rm --help'.
Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...] [flags]
Remove one or more containers

Edit: Neither is the 2nd:

iotedgectl setup --connection-string "{connection-string}" --auto-cert-gen-force-no-passwords --agent microsoft/azureiotedge-agent:1.0.0-preview
021
iotedgectl : usage: iotedgectl setup [-h] [--config-file] [--connection-string]
    + CategoryInfo          : NotSpecified: (usage: iotedgec...nection-string]:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

                        [--edge-config-dir] [--edge-home-dir]
                        [--edge-hostname] [--runtime-log-level] [--image]
                        [--docker-registries  [...]] [--docker-uri]
                        [--upstream-protocol]
                        [--auto-cert-gen-force-no-passwords]
                        [--owner-ca-cert-file] [--device-ca-cert-file]
                        [--device-ca-chain-cert-file]
                        [--device-ca-private-key-file]
                        [--device-ca-passphrase-file] [--device-ca-passphrase]
                        [--agent-ca-passphrase-file] [--agent-ca-passphrase]
                        [-C] [-ST] [-L] [-OR] [-OU] [-CN]
iotedgectl setup: error: ambiguous option: --agent could match --agent-ca-passphrase-file, --agent-ca-passphrase
darobs commented 6 years ago

My bad. That command works as is in Linux, and I thought the same form worked in Powershell.

Essentially, you want to run docker rm -f on all existing containers.

...and the other mistake is it's not --agent but --image

iotedgectl setup --connection-string "{connection-string}" --auto-cert-gen-force-no-passwords --image microsoft/azureiotedge-agent:1.0.0-preview
021
Hammatt commented 6 years ago

Alright, so what I've done is stop all the containers, run docker system prune -a and then double checked that the containers are gone with docker ps -a. Then I've ran the setup command without the image argument and the result was the same:

docker logs edgeHub
docker :
    + CategoryInfo          : NotSpecified: (:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

Unhandled Exception: System.AggregateException: One or more errors occurred. (Access is denied) --->
Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException: Access is denied
   at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile(Byte[] rawData, String fileName, SafePasswordHandle
password, X509KeyStorageFlags keyStorageFlags)
   at System.Security.Cryptography.X509Certificates.X509Certificate..ctor(String fileName, String password,
X509KeyStorageFlags keyStorageFlags)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Hosting.Initialize(String certPath) in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Hosting.cs:line 23
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.<MainAsync>d__1.MoveNext() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 48
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 27

I then pruned again and tried with the --image flag this time. I wasn't quite sure what to put after the image flag as you said a couple of different things but i went with microsoft/azureiotedge-agent:1.0.0-preview021, and not the one for the hub because it seemed to change the Edge Agent Image field that the setup command displayed.

I then check and it has the same output again:

docker logs edgeHub
docker :
    + CategoryInfo          : NotSpecified: (:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

Unhandled Exception: System.AggregateException: One or more errors occurred. (Access is denied) --->
Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException: Access is denied
   at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile(Byte[] rawData, String fileName, SafePasswordHandle
password, X509KeyStorageFlags keyStorageFlags)
   at System.Security.Cryptography.X509Certificates.X509Certificate..ctor(String fileName, String password,
X509KeyStorageFlags keyStorageFlags)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Hosting.Initialize(String certPath) in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Hosting.cs:line 23
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.<MainAsync>d__1.MoveNext() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 48
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in
/opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 27

Not really sure what's going on now, if it happens in this version too. Have you been able to reproduce the issue at all?

aribeironovaes commented 6 years ago

You have to rollback to preview21. If you don’t put the image info you will get preview22.

Sent from my phone

On Mar 28, 2018, at 4:55 PM, Alexander Hammatt notifications@github.com wrote:

Alright, so what I've done is stop all the containers, run docker system prune -a and then double checked that the containers are gone with docker ps -a. Then I've ran the setup command without the image argument and the result was the same:

docker logs edgeHub docker :

  • CategoryInfo : NotSpecified: (:String) [], RemoteException
  • FullyQualifiedErrorId : NativeCommandError

Unhandled Exception: System.AggregateException: One or more errors occurred. (Access is denied) ---> Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException: Access is denied at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile(Byte[] rawData, String fileName, SafePasswordHandle password, X509KeyStorageFlags keyStorageFlags) at System.Security.Cryptography.X509Certificates.X509Certificate..ctor(String fileName, String password, X509KeyStorageFlags keyStorageFlags) at Microsoft.Azure.Devices.Edge.Hub.Service.Hosting.Initialize(String certPath) in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Hosting.cs:line 23 at Microsoft.Azure.Devices.Edge.Hub.Service.Program.d__1.MoveNext() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 48 --- End of inner exception stack trace --- at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification) at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 27 I then pruned again and tried with the --image flag this time. I wasn't quite sure what to put after the image flag as you said a couple of different things but i went with microsoft/azureiotedge-agent:1.0.0-preview021, and not the one for the hub because it seemed to change the Edge Agent Image field that the setup command displayed.

I then check and it has the same output again:

docker logs edgeHub docker :

  • CategoryInfo : NotSpecified: (:String) [], RemoteException
  • FullyQualifiedErrorId : NativeCommandError

Unhandled Exception: System.AggregateException: One or more errors occurred. (Access is denied) ---> Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException: Access is denied at Internal.Cryptography.Pal.CertificatePal.FromBlobOrFile(Byte[] rawData, String fileName, SafePasswordHandle password, X509KeyStorageFlags keyStorageFlags) at System.Security.Cryptography.X509Certificates.X509Certificate..ctor(String fileName, String password, X509KeyStorageFlags keyStorageFlags) at Microsoft.Azure.Devices.Edge.Hub.Service.Hosting.Initialize(String certPath) in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Hosting.cs:line 23 at Microsoft.Azure.Devices.Edge.Hub.Service.Program.d__1.MoveNext() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 48 --- End of inner exception stack trace --- at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification) at Microsoft.Azure.Devices.Edge.Hub.Service.Program.Main() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Service/Program.cs:line 27 Not really sure what's going on now, if it happens in this version too. Have you been able to reproduce the issue at all?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Hammatt commented 6 years ago

Sorry I may have not been clear, I ran iotedgectl setup --connection-string "{connection string}" --auto-cert-gen-force-no-passwords --image microsoft/azureiotedge-agent:1.0.0-preview021 and it still failed wit hthe same error.

edit: oh, i think i see what you mean now

Hammatt commented 6 years ago

Sorry for the confusion earlier, I'm up and running now on edgeHub Version - 1.0.0-preview021.10543704

yphuangms commented 6 years ago

Hi, I was curious, is there anyone that can successfully run preview22 version of edgeAgent , edgeHub using windows container?

I have run into the same issue as @Hammatt, and I've tried both Windows 10 destop, Windows IoT Core, but all failed with the same exception.

Can we say latest edgeHub windows container (privew22) has blocking issue, and the only way to start IoT Edge on windows platform is to rollback to preview21?

And what's the steps to rollback to preview21? By running iotedgectl setup to change edgeAgent image version doesn't make change to edgeHub version, it still use the latest edgeHub image (preview22)... Any help would be very appreciated, thanks!

Orlando1991 commented 6 years ago

@yphuangms , you have to change it in the Azure portal. Go to where you would set your modules. Click on Configure advanced Edge runtime settings. And change image to: microsoft/azureiotedge-hub:1.0.0-preview021

darobs commented 6 years ago

@yphuangms

I was able to run preview22 error free with Windows containers on my Windows 10 PC, but it was a completely new deployment.

yphuangms commented 6 years ago

@Orlando1991, Thanks! It helps a lot! And from trial and error, I found that only edgeHub requires rollback.

@darobs , Do you know if there will be a quick release to recover edgeHub preview22? Or just leave it as it is, those who encounter the same issue have to resolve on their own?

michael-chi commented 6 years ago

I can confirm that new version of edge runtime works fine with RPi now.

Hammatt commented 6 years ago

Hey @darobs, I can run preview22 with windows containers on windows 10 pro, the problem occurs for me on Windows 10 IoT Core only. When you say that you were able to run it, was that windows 10 pro or windows 10 iot core?

darobs commented 6 years ago

Windows 10 pro, not IoT Core. I've reached out to our Windows experts for more help.

Hammatt commented 6 years ago

~I take what I said back, on windows 10 pro I'm getting endless timeouts: CONNECT failed: RefusedNotAuthorized, caused by: Microsoft.Azure.Devices.Client~

After running docker system prune -a preview022 worked for me on windows 10 pro.

v-tbert commented 6 years ago

@Hammatt do you by chance use SetMethodHandlerAsync and/or SetMethodDefaultHandlerAsync?

Hammatt commented 6 years ago

@v-tbert no

darobs commented 6 years ago

Hello @Hammatt

According to our Windows experts, this looks like a permissions issue on the certificate file. They suggested possibly a missing read or read/execute permission. Would you please check this?

Hammatt commented 6 years ago

Hey @darobs , I should have permission as I'm running the commands from an admin account.

([Security.Principal.WindowsPrincipal] `
 [Security.Principal.WindowsIdentity]::GetCurrent()
).IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)

returns true.

If you're able to tell me the location on disk where the certificate could be, I could double check.

Hammatt commented 6 years ago

Just updating, still seeing this issue here. Edge Hub Preview022 won't start up on any of our iot core devices, but downgrading to Preview 021 without changing anything else does start up.