Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 458 forks source link

ModuleClient.InvokeMethodAsync() Inside a direct method callback gives errorCode 404001 back to azure portal despite invocation is successful #337

Closed NoExitTV closed 5 years ago

NoExitTV commented 6 years ago

Hi. Hopefully someone can help me with this issue. I've previously been active in issue #204 as well as in a question on stack overflow in this question

My main goal is that I want to invoke a direct method on a module, call it ModuleA. Inside this method invocation, ModuleA should in turn invoke a direct method in another module, call it ModuleB. ModuleA should then return the response received from ModuleB back to the back-end (azure portal).

As you can see in the question on stack overflow, as well as in the github issue, I managed to invoke a direct method on another module (running on the same IoT Edge runtime) directly from the Init() function (or any function called from init() really).

The problem arise when I try to do the same directly from the callback of another direct method. Lets call the direct method on ModuleA for SampleA and the direct method on ModuleB for SampleB.

Inside the SampleA method, I do a:

MethodRequest request= new MethodRequest("SampleB", Encoding.UTF8.GetBytes("{ \"Message\": \"Hello\" }"));
var response= await ioTHubModuleClient.InvokeMethodAsync(deviceId, "ModuleB", request).ConfigureAwait(false);

And as soon as I do the InvokeMethodAsync call to SampleB located in ModuleB from the callback of SampleA inside ModuleA I get an error response back to azure portal (or my back end application).

But when reading the logs I do see that the method invocation is successful and SampleA gets the correct response from SampleB. Despite this an error is reported back to the back end for some reason.

Also, ModuleA "hangs" and I can not communicate with it anymore.

The desired request/response-chain can be visualised to:

*[Request]* Azure Portal -> ModuleA/SampleA -> ModuleB/SampleB -> *[Response]* ModuleA/SampleA -> Azure Portal. 

Expected Behavior

When invoking SampleA on ModuleA the function should return the response from invoking SampleB on ModuleB back to azure portal (back end).

Current Behavior

As soon as SampleB is invoked from SampleA, the following error is reported to Azure Portal:

{"message":"NotFound:{\r\n  \"Message\": \"{\\\"errorCode\\\":404001,\\\"trackingId\\\":\\\"e13bce129ead4c9cb398ff8dfb45fb54-G:0-TimeStamp:09/21/2018 09:10:19\\\",\\\"message\\\":\\\"Device unavailable\\\",\\\"timestampUtc\\\":\\\"2018-09-21T09:10:19.3937667Z\\\"}\",\r\n  \"ExceptionMessage\": \"\"\r\n}"}

And printing the error message and stack trace in my own simulated back end app gives:

Device {"Message":"{\"errorCode\":404001,\"trackingId\":\"a50c10bde2184431be059dc2932c862c-G:8-TimeStamp:09/21/2018 09:19:07\",\"message\":\"Device unavailable\",\"timestampUtc\":\"2018-09-21T09:19:07.1230963Z\"}","ExceptionMessage":""} not registered
   at Microsoft.Azure.Devices.HttpClientHelper.ExecuteAsync(HttpClient httpClient, HttpMethod httpMethod, Uri requestUri, Func`3 modifyRequestMessageAsync, Func`2 isMappedToException, Func`3 processResponseMessageAsync, IDictionary`2 errorMappingOverrides, CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.HttpClientHelper.PostAsync[T,T2](Uri requestUri, T entity, TimeSpan operationTimeout, IDictionary`2 errorMappingOverrides, IDictionary`2 customHeaders, CancellationToken cancellationToken)
   at CallMethod.Program.Main(String[] args) in C:\Users\NoExit\Desktop\iotedge-direct-module-method-sample\CallMethod\Program.cs:line 41

And the communication with ModuleA is lost. All new method invocations will time out waiting for the module.

Taking a look in the logs you can see that ModuleA could invoke the method on ModuleB without any issues and both recieved and returned the correct response.

Steps to Reproduce

  1. Create a module called ModuleA and register the following function in Init()

    await ioTHubModuleClient.SetMethodHandlerAsync("MethodA", MethodA, ioTHubModuleClient);

    And add the following function to it:

    static async Task<MethodResponse> MethodA(MethodRequest methodRequest, object moduleClient)
    {
        try 
        {
            ModuleClient ioTHubModuleClient = (ModuleClient)moduleClient;             
    
            // Get deviced id of this device, exposed as a system variable by the iot edge runtime
            var deviceId = System.Environment.GetEnvironmentVariable("IOTEDGE_DEVICEID");
    
            // Create the request
            MethodRequest request = new MethodRequest("MethodB", Encoding.UTF8.GetBytes("{ \"Message\": \"Hello\" }"));
    
            // Execute request
            var resp = await ioTHubModuleClient.InvokeMethodAsync(deviceId, "ModuleB", request);
    
            // Write response to log for debugging
            Console.WriteLine($"Got result {resp.ResultAsJson} with status code {resp.Status}");
    
            return resp;               
        }
        catch(Exception ex)
        {
            Console.WriteLine(ex.Message);
            Console.WriteLine(ex.StackTrace);
            return await Task.FromResult(new MethodResponse(500));
        } 
    }
  2. Create a module called ModuleB and register the following function in init()

    await ioTHubModuleClient.SetMethodHandlerAsync("MethodB", MethodB, null);

    And add the following function to it:

    private static Task<MethodResponse> MethodB(MethodRequest methodRequest, object userContext)
    {
        Console.WriteLine("MethodB has been called");
    
        // Get data but we do not do anything with it
        var data = Encoding.UTF8.GetString(methodRequest.Data);
    
        Console.WriteLine("Received data: " + data.ToString());
    
        var methodResponse = new MethodResponse(Encoding.UTF8.GetBytes("{\"Message\": \"Hello from ModuleB\"}"), 200);
        return Task.FromResult(methodResponse);
    } 
  3. Navigate to the Azure Portal -> your IoT Hub -> Your iot edge device -> ModuleA -> Direct method and invoke the method: MethodA
  4. Notice the error you receive in Azure Portal but when looking at the logs for ModuleA you'll see:
    Got result {"Message":"Hello from ModuleB"} with status code 200

Context (Environment)

Device (Host) Operating System

Windows 10 (develop machine)

Architecture

amd64 Also tried amd64-win

Container Operating System

Linux containers Also tried Windows containers

Runtime Versions

The modules run .Net Core 2.1 specified in the respective docker file.

iotedged

iotedge 1.0.0 Mostly running in simulator while developing Simulator says: version 1.0.1.15962126 when staring

Edge Agent

"image": "mcr.microsoft.com/azureiotedge-agent:1.0",

Edge Hub

"image": "mcr.microsoft.com/azureiotedge-hub:1.0",

Docker

$ docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:21:34 2018
 OS/Arch:           windows/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:29:02 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Additional Information

I've tried to deploy this to a "real" IoT Edge runtime as well but I mostly just run it in simulation when developing.

dsajanice commented 6 years ago

Thanks for reporting. I am investigating the issue.

NoExitTV commented 6 years ago

Thanks for reporting. I am investigating the issue.

Great! Let me know if you need any more info

NoExitTV commented 6 years ago

@dsajanice have you've been able to reproduce the issue?

dsajanice commented 6 years ago

I am currently working on it but don't have a root cause yet.

dsajanice commented 6 years ago

Do you see the output of Console.WriteLine("MethodB has been called")?

NoExitTV commented 6 years ago

@dsajanice Yes in the output I do see:

ModuleB | MethodB has been called
ModuleB | Received data: {"Message":"Hello"}

I also see the response from MethodB in MethodA.

While debugging (stepping through) ModuleA in the direct method callback it seems that the method responds successfully but as soon as the: var resp = await ioTHubModuleClient.InvokeMethodAsync(deviceId, "ModuleB", request); is called, I get the above response in IoT Hub (and my own back-end application) for some reason?

dsajanice commented 6 years ago

I am running through this scenario and hitting some other issues. For example, I am unable to get a response from ModuleB in ModuleA's method callback. The request times out. Still trying to figure out whats going on.

NoExitTV commented 6 years ago

Hmm that's wierd. Are you sure that you're building the modules in .Net Core 2.1? (check the dockerfile for each module?) What if you step through the execution of ModuleA in debug mode, then you might be able to figure out where it times out. Although it will probably be at the
ioTHubModuleClient.InvokeMethodAsync(deviceId, "ModuleB", request); request I recon...

I also noticed that you can not use var deviceId = System.Environment.GetEnvironmentVariable("IOTEDGE_DEVICEID"); in the simulator since this environment variable won't be exposed so you might want to set that manually.

Also, what happens when you do as the answer suggests here: https://stackoverflow.com/questions/52073548/azure-iot-edge-moduleclient-invoke-direct-method-in-another-module i.e. calling ModuleB/MethodB from the Init() function of ModuleA in a loop? Does that succeed?

dsajanice commented 6 years ago

What output do you see from these lines in ModuleA: // Write response to log for debugging Console.WriteLine($"Got result {resp.ResultAsJson} with status code {resp.Status}");

NoExitTV commented 6 years ago

You can see above under "Steps to reproduce":

  1. Notice the error you receive in Azure Portal but when looking at the logs for ModuleA you'll see: Got result {"Message":"Hello from ModuleB"} with status code 200

Best regards, Fredrik.

dsajanice commented 6 years ago

I apologize this has taken so long. It took me a while to get a debuggable setup working so that I could reproduce and step through all the calls to see where the issue was. I was able to localize the issue. When module A connects to EdgeHub to invoke a method in module B, EdgeHub closes the original cloud connection it had for module A. This results in the error that we are seeing in the portal. I am marking this as a bug and we will be tracking it internally. I will post more updates if I have any. Thanks for reporting this once again!

varunpuranik commented 6 years ago

Thanks for the investigation @dsajanice

@NoExitTV - Can you upgrade to version 1.0.2 of the EdgeHub? mcr.microsoft.com/azureiotedge-hub:1.0.2, and create a new IoT Edge device and try? That should work without this problem (if your IoThub is not in EastUS).

NoExitTV commented 6 years ago

Thanks @dsajanice and @varunpuranik . Sorry for the late reply. Been really busy in school etc.

I've tried creating a new IoT Edge device and run it in the Visual Studio Code simulator with iotedge-hub version 1.0.2 without success so far. I'll try and deploy it on a real install of IoT edge runtime as soon as I can.

NoExitTV commented 6 years ago

@dsajanice @varunpuranik Just tried this with a fresh install of iotedge runtime on a Windows 10 VM. I installed iotedge runtime with the command:

 . {Invoke-WebRequest -useb aka.ms/iotedge-win} | Invoke-Expression; `
>> Install-SecurityDaemon -Manual -ContainerOs Linux

I then created a new client on iothub and provisioned the new device with that client. My IoTHub is located in West Europe.

I still got the same error in Azure Portal... Printing the version and logs in powershell gives me:

PS C:\WINDOWS\system32> iotedge --version
iotedge 1.0.2 (f6fcdb2e24ac2af977e748e9f2b8a37245b92738)

PS C:\WINDOWS\system32> iotedge logs ModuleA
[INFO] ModuleA: IoT Hub module client initialized.
[DEBUG] ModuleA(remove): Initialized MyMessages
[INFO] ModuleA: Trying to register direct methods for order gateway
[INFO] ModuleA: Direct methods registered with success
Proxy method called!
Device id is: MyDeviceId
Module id is: ModuleB
{"Message":"Hello from ModuleB"}
Responding!
PS C:\WINDOWS\system32>

Note that I've changed some prints here but the connection to IoTHub still seems to close as soon as the InvokeMethodAsync() is called for the ModuleClient.

And my deployment.json for edgehub is set as:

"edgeHub": {
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "mcr.microsoft.com/azureiotedge-hub:1.0.2",
              "createOptions": "{\"HostConfig\":{\"PortBindings\":{\"5671/tcp\":[{\"HostPort\":\"5671\"}], \"8883/tcp\":[{\"HostPort\":\"8883\"}],\"443/tcp\":[{\"HostPort\":\"443\"}]}}}"
        }
}
varunpuranik commented 6 years ago

@NoExitTV - Sorry I should have mentioned - this will not work in West Europe either (along with East US). I am trying to fix the issue in EdgeHub such that it will work everywhere, but it is a hard fix, so I don't have a timeline on the fix yet.

NoExitTV commented 6 years ago

Thanks @varunpuranik . So it's ok for me to switch to, say west us, temporarily while you're working on a fix for the rest of the regions? Higher latency but it should still be fine while developing I recon.

varunpuranik commented 6 years ago

Except East US and West Europe, any other region should be fine (in case you are in Europe)

NoExitTV commented 6 years ago

Tried West US and it works great while running on a "real" installation of IoT Edge runtime. It doesn't work at all when running in the VS Code simulator tho.

Why is there a difference between running it in on an install of Edge runtime vs running it in the VS Code simulator? I mean, both still run in docker containers and isn't edgeHub is loaded from the same image? Hmm.

The simulator shows the following warning:

edgeHubDev           | 2018-10-07 21:12:41.286 +00:00 [WRN] - Error while refreshing the service identity for my-device-id/ModuleA
edgeHubDev           | Microsoft.Azure.Devices.Edge.Hub.CloudProxy.DeviceScopeApiException: Message: Error getting device scope result from IoTHub, HttpStatusCode: Unauthorized, Content: {"Message":"ErrorCode:IotHubUnauthorizedAccess;Unauthorized","ExceptionMessage":"Tracking ID:a73cdc8806ed4f5dbef743b806a86371-G:9-TimeStamp:10/07/2018 21:12:41"}
edgeHubDev           |    at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.DeviceScopeApiClient.GetIdentitiesInScope(Uri uri) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/DeviceScopeApiClient.cs:line 115
edgeHubDev           |    at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.DeviceScopeApiClient.GetIdentitiesInScopeWithRetry(Uri uri) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/DeviceScopeApiClient.cs:line 85
edgeHubDev           |    at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ServiceProxy.GetServiceIdentity(String deviceId, String moduleId) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ServiceProxy.cs:line 77
edgeHubDev           |    at Microsoft.Azure.Devices.Edge.Hub.Core.DeviceScopeIdentitiesCache.RefreshServiceIdentity(String id) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Core/DeviceScopeIdentitiesCache.cs:line 140
varunpuranik commented 6 years ago

@NoExitTV - Thanks for trying it out in West US. Yes, I would indeed expect it to work the same in the VS Code simulator as well. I suspect that this might be a bug in VS Code where it is using a slightly different configuration from the one used in the actual deployment. I will follow up on it and see if we can fix it.

adashen commented 6 years ago

@NoExitTV There is a bug in simulator and we have a fix in hand now. Will keep you updated when it is released.

NoExitTV commented 5 years ago

Great @adashen . Another quick question regarding the VS Code simulator (since you're here?). I can't get Binds to work by specifying "createOptions" in the deplayment template file in the simulator. The Binds work just fine in a "real" instance of IoT Edge runtime but the folder isn't mounted in the sim?

My createOptions looks like

"createOptions": "{\"HostConfig\": {\"Binds\": [\"c:/Users/CustomData:/customdata\"]}}"
adashen commented 5 years ago

@NoExitTV , Thanks for the feedback. https://github.com/Azure/iotedgehubdev/issues/98 has been added. We will fix the issue.

myagley commented 5 years ago

Closing. Please track in the iotedgehubdev issue above.