Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 458 forks source link

ModuleClient.OpenAsync throws AuthenticationException #1044

Closed jeffbi closed 5 years ago

jeffbi commented 5 years ago

Expected Behavior

ModuleClient.OpenAsync() should succeed.

Current Behavior

function throws AggregateException, with the innermost exception being System.Security.Authentication.AuthenticationException

Steps to Reproduce

  1. Follow the instructions in the tutorial here: https://docs.microsoft.com/en-us/azure/iot-edge/how-to-vs-code-develop-module. Do not modify the code or project at all.
  2. Set a breakpoint in the Init function.
  3. When the breakpoint hits, execute the ModuleClient.CreateFromEnvironmentAsync function, successfully
  4. Upon return, attempt to execute the OpenAsync method.
  5. See the exception.

Context (Environment)

Device (Host) Operating System

Ubuntu 18.04

Architecture

amd64

Container Operating System

Linux containers

Runtime Versions

Edge runtime not installed, using iotedgehubdev

iotedged

Edge Agent

Edge Hub

Docker

18.09.4

Logs

Additional Information

This is the error output:

Exception has occurred: CLR/System.AggregateException
An unhandled exception of type 'System.AggregateException' occurred in System.Private.CoreLib.dll: 'One or more errors occurred.'
 Inner exceptions found, see $exception in variables window for more details.
 Innermost exception     System.Security.Authentication.AuthenticationException : The remote certificate is invalid according to the validation procedure.
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Net.Security.SslState.StartSendAuthResetSignal(ProtocolToken message, AsyncProtocolRequest asyncRequest, ExceptionDispatchInfo exception)
   at System.Net.Security.SslState.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.ProcessReceivedBlob(Byte[] buffer, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
   at System.Net.Security.SslState.PartialFrameCallback(AsyncProtocolRequest asyncRequest)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Net.Security.SslState.ThrowIfExceptional()
   at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
   at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
   at System.Net.Security.SslStream.EndAuthenticateAsClient(IAsyncResult asyncResult)
   at System.Net.Security.SslStream.<>c.<AuthenticateAsClientAsync>b__46_2(IAsyncResult iar)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)

I have been through the tutorial three times today and had this experience each time. On the other hand, I successfully debugged and deployed a module on this same box using the tutorial last week.

jeffbi commented 5 years ago

I just put together a brand new Linux box and I'm still seeing this.

varunpuranik commented 5 years ago

@jeffbi - What is happening here is that the module is trying to connect to the EdgeHub (which is part of the Edge rutime). It has its own certificate, which may not be rooted in a Baltimore certificate (certainly not if you used the quickstart mode, in which the Edge runtime generates all the required certificates for you as self signed certificates). Because of this, the module code is complaining that it is not able to validate the server certificate. For the dev scenario such as yours, I believe the fix is to install the edge device certificate as a trusted certificate on the dev box. @adashen might be able to help you with that.

jeffbi commented 5 years ago

@varunpuranik Thanks for responding to this.

Just as a bit more information, this is not limited to a single machine. So far it has happened on at least three VirtualBox VMs, a dedicated hardware box and an Azure VM, all with Ubuntu 18.04 (most desktop, one server) with the OS freshly installed.

Also, sometimes we can debug for a while but we get the issue upon restarting the machine. My colleague was able to get through one restart of the Azure VM, but on the second restart experienced the issue immediately.

adashen commented 5 years ago

@jeffbi can you give more information about whether you are debugging the source code without running in container or you are trying to attach the module running in the remote vm?

jeffbi commented 5 years ago

@adashen In all cases we are using the machine, whether physical or virtual, to debug source code locally (I assume without a container). No attempt to attach to a remote module.

adashen commented 5 years ago

@jeffbi, so "iotedgehubdev" and "vscode" are running on the same machine? Can you setup "edge simulator" again from the "AZURE IOT HUB DEVICES" palette from vscode?

jeffbi commented 5 years ago

@adashen Yes, iotedgehubdev and VS Code are running on the same machine. When I setup the simulator, I always do it from the right-click menu on a device in the "AZURE IOT HUB DEVICES" area as opposed to from an option in the Command Palette. My colleague is off-line right now, but I think she did it that way too.

I just tried it again, followed by "Start IoT Edge Hub Simulator for Single Module" from the Command Palette. I still get the exception.

adashen commented 5 years ago

@jeffbi , Can you try the following

  1. Setup simulator again with an edge device key
  2. Run the solution in simulator (Right click the deployment.template.json file to select run in simulator).
  3. Check whether the whole solution could run successfully.

These steps are trying to make sure the cert generated is updated and successfully.

If the solution could run successfully. Please

  1. Stop the simulator
  2. Start simulator for single module
  3. Local debug your module again.
jeffbi commented 5 years ago

@adashen OK, I set up the simulator with the edge device key (connection string) and selected "Build and Run Iot Edge Solution in Simulator" from the right-click menu on deployment.template.json. The solution appeared to run properly, with received message being displayed in the console.

I stopped the simulator and started it for single module. Debugging the module gets the same exception at the same place.

adashen commented 5 years ago

@jeffbi , Thank you very much for the information, we will continue the investigation and keep you updated when the root cause is found

jeffbi commented 5 years ago

@adashen Thank you for your time. We are blocked by this situation, so any information you can provide would be welcomed.

adashen commented 5 years ago

@jeffbi , can you use attach mode to debug your modules for now? Besides, if you would like to use the local mode. you may try it on windows host machine for now.

jeffbi commented 5 years ago

@adashen At the moment, no, I cannot use attach mode. First, when I attempt to start debugging with the Remote Debug configuration I just get an unhelpful error message box and VS Code opens launch.json, with no real hint as to what the problem might be.

Secondly, I'm trying to debug code that happens in the Init method. By the time I'd be able to attach to the running process, that code has already gone by.

I'll look into debugging on Windows as a stop-gap measure.

SLdragon commented 5 years ago

Hi, @jeffbi , which dotnet SDK version do you use?

jeffbi commented 5 years ago

@SLdragon dotnet --version shows 2.2.202

jeffbi commented 5 years ago

Hi, @adashen. Just to be clear, this does not appear to be strictly a debugging issue, but perhaps a device simulator issue. I get the same exception when I run without debugging in VS Code.

@varunpuranik said that the module is trying to connect to EdgeHub, which manages its own certificates. As I mentioned in the initial issue submission, these machines don't have the Edge runtime installed, because we're using iotedgehubdev. Is there a certificate that we need to install on these machines?

SLdragon commented 5 years ago

Hi, @jeffbi , you can try to install the certificates to your trust store use these commands below:

cd /var/lib/iotedgehubdev/certs/edge-device-ca/cert/
sudo mkdir /usr/local/share/ca-certificates/extra
sudo cp edge-device-ca.cert.pem /usr/local/share/ca-certificates/extra/root.cert.crt
sudo update-ca-certificates

Maybe it can help you solve the problem

jeffbi commented 5 years ago

Hi, @SLdragon. I tried this on two machines, one VirtualBox VM and one dedicated physical machine and got the same exception in both cases.

SLdragon commented 5 years ago

Hi, @jeffbi , could you please try to use node module to check whether the issue still exists? We think that maybe it is related to the C# SDK

jeffbi commented 5 years ago

@SLdragon Sorry, I'm not sure what you mean by use node module.

SLdragon commented 5 years ago

Hi, @jeffbi, here are the steps to create Azure IoT Edge Node module, you must install node.js on your machine first:

  1. Open command palette, select Azure IoT Edge: Add IoT Module
  2. Select your deployment.template.json
  3. Then select Node.js Module
  4. Type module name and repository
  5. Start IoT Edge Hub Simulator for single Module
  6. Start local debug for Node.js
jeffbi commented 5 years ago

@SLdragon (Sorry, not a Node or JS guy)

Before any of my breakpoints hit, I'm getting the following error when I attempt to debug:

internal/modules/cjs/loader.js:670
    throw err;
    ^

Error: Cannot find module 'azure-iot-device-mqtt'
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:668:15)
    at Function.Module._load (internal/modules/cjs/loader.js:591:27)
    at Module.require (internal/modules/cjs/loader.js:723:19)
    at require (internal/modules/cjs/helpers.js:14:16)
    at Object.<anonymous> (/home/jeff/Test/SampleNodeModule/app.js:3:17)
    at Module._compile (internal/modules/cjs/loader.js:813:14)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:827:10)
    at Module.load (internal/modules/cjs/loader.js:685:32)
    at Function.Module._load (internal/modules/cjs/loader.js:620:12)
    at Function.Module.runMain (internal/modules/cjs/loader.js:877:12)

Is there something I'm missing?

jeffbi commented 5 years ago

@SLdragon Are you folks able to repro this issue? If not, we have an Azure VM exhibiting this problem that we could make available to you. If you'd like to send me a private message I can get you credentials.

SLdragon commented 5 years ago

@jeffbi according to the error message, maybe you are not install node.js dependencies. You can go to Node Module folder and run the command below:

npm install
SLdragon commented 5 years ago

Hi, @jeffbi, we also try to create a brand new environment to repro this issue.

We created a Azure VM with Ubuntu 18.04, and then install desktop, docker, iotedgehubdev, dotnet core, vs code and so on, however, the issue not happen.

We then try to create a Unbuntu in Hyper-V, just as We did on Azure VM, however, the issue still not happen.

We only have one test machine can repro this issue sometimes, and sometimes it is good.

I will send you an email for my test environment, maybe you can have a try and compare the different between the environment, you can also give me your environment, thanks!

jeffbi commented 5 years ago

Hi, @SLdragon. Thanks for the additional info about Node. I ran a debug session with the Node module and it worked. Immediately afterward I ran a debug session with the C# module and got the exception.

jeffbi commented 5 years ago

@SLdragon On my current VirtualBox VM the problem is occurring 100% of the time. On the dedicated hardware Linux box it worked twice, but after rebooting the machine the exception occurred and has been ever since. On the Azure cloud box it worked for a while, it continued to work after a shutdown/restart, then after a second shutdown/restart we started seeing the exception.

All three of these boxes are running Ubuntu 18.04

SLdragon commented 5 years ago

Hi, @jeffbi , as discusses from email, only C# modules would throw this error, and Node JS module works fine for all the machines. So we consider this issue is related to the SDK certificates verification logic and I created an issue on SDK repo: https://github.com/Azure/azure-iot-sdk-csharp/issues/911