Azure / azure-iot-sdk-c

A C99 SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-c
Other
588 stars 739 forks source link

ThreadAPI_Create failed on SetModuleMethodCallback for Deepstream application in Docker #2402

Closed joranapixa closed 1 year ago

joranapixa commented 1 year ago

Setup

Dev machine: Jetson Nano, Ubuntu 18.04.5 LTS, SDK Compiled using GCC 7.5.0

SDK version "lts_07_2022", commit hash 3fd808bee3c88f8578dcc30c0a6d1d396c172070

I'm using the SDK in a C++ Nvidia Deepstream application by linking the libraries as follows (Makefile snippets, not full file):

CXX=g++ CXXFKAGS=-std=c++14 ... LIB_INSTALL_DIR?=/opt/nvidia/deepstream/deepstream-$(NVDS_VERSION)/lib/ PKGS:= gstreamer-1.0 opencv ... LIBS:= pkg-config --libs $(PKGS) LIBS+= -L$(LIB_INSTALL_DIR) -lstdc++ -lnvdsgst_meta -lnvds_meta \ -lnvbufsurface -lnvbufsurftransform -lnvdsgst_helper -lnvds_batch_jpegenc \ -laziotsharedutil -liothub_client -liothub_client_mqtt_transport -lumock_c -lpthread\ -lm \ -Wl,-rpath,$(LIB_INSTALL_DIR) ...

The compiled libaziotsharedutil.a, libiothub_client.a, libiothub_client_mqtt_transport.a and libumock_c.a libraries have all been copied to the LIB_INSTALL_DIR and I have all the necessary headers for the included libraries in a separate folder and add them to the CFLAGS as follows: CFLAGS+= -Iazure-iot-sdk/inc

This compiles successfully.

To run it in an Azure IOT context, the application is deployed in a docker container as an Azure IOT module.

The Docker image starts from "nvcr.io/nvidia/deepstream-l4t:5.0.1-20.09-base" and copies the compiled application executable to the container, as well as the necessary libraries.

Bug description

In the C++ code, I have a very simple application that sets up the IoTHubModuleClient and then sets a ModuleMethodCallback. I'm using the MQTT protocol. The IoTHubModuleClient_CreateFromEnvironment(MQTT_Protocol) call succeeds, but when setting up the ModuleMethodCallback through IoTHubModuleClient_SetModuleMethodCallback, I always get the following error (Console Log):

Error: Time:Thu Nov 10 10:47:25 2022 File:/home/nvidia/azure/azure-iot-sdk-c/iothub_client/src/iothub_client_core.c Func:StartWorkerThreadIfNeeded Line:736 ThreadAPI_Create failed
Error: Time:Thu Nov 10 10:47:25 2022 File:/home/nvidia/azure/azure-iot-sdk-c/iothub_client/src/iothub_client_core.c Func:IoTHubClientCore_SetDeviceMethodCallback Line:1785 Could not start worker thread
Exception caught in background thread.  Unable to handle.
['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
Exception caught in background thread.  Unable to handle.
['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
etc etc

For reference, pthread is installed on the local system as well as in the docker, and they are the same version. I've tried building the Azure IOT SDK in the Dockerfile and using those libraries for the application instead of copying them in, but this does not resolve the issue.

Minimal reproducible example:

#include <unistd.h>

#include "iothub_module_client.h"
#include "azure_c_shared_utility/crt_abstractions.h"
#include "azure_c_shared_utility/threadapi.h"

#include "iothubtransportmqtt.h"

static int deviceMethodCallback(const char* method_name, const unsigned char* payload, size_t size, unsigned char** response, size_t* response_size, void* userContextCallback)
{
    (void)userContextCallback;
    (void)payload;
    (void)size;

    int result = -1;
    const char deviceMethodResponse[] = "{ }";
    *response_size = sizeof(deviceMethodResponse)-1;
    *response = (unsigned char*) malloc(*response_size);
    (void)memcpy(*response, deviceMethodResponse, *response_size);
    return result;
}

int main(void)
{
    IOTHUB_MODULE_CLIENT_HANDLE  iotHubClientHandle = IoTHubModuleClient_CreateFromEnvironment(MQTT_Protocol);
    if (iotHubClientHandle == NULL)
    {
        return -1;
    }
    else
    {
        (void)IoTHubModuleClient_SetModuleMethodCallback(iotHubClientHandle, deviceMethodCallback, NULL);
    }
    while(true)
    {
        sleep(1000000);
    }
}
danewalton commented 1 year ago

Just to make sure we're getting the most verbose logs, can you make sure the trace is on?

bool traceOn = true;
(void)IoTHubModuleClient_SetOption(device_handle, OPTION_LOG_TRACE, &traceOn);
joranapixa commented 1 year ago

Hey, Thanks for your reply. I've turned the trace on and now I get the following:

Error: Time:Mon Nov 14 15:12:13 2022 File:/home/nvidia/azure/azure-iot-sdk-c/iothub_client/src/iothub_client_core.c Func:StartWorkerThreadIfNeeded Line:736 ThreadAPI_Create failed
Error: Time:Mon Nov 14 15:12:13 2022 File:/home/nvidia/azure/azure-iot-sdk-c/iothub_client/src/iothub_client_core.c Func:IoTHubClientCore_SetDeviceMethodCallback Line:1785 Could not start worker thread
-> 15:12:13 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 192 | USERNAME: <omitted> | PWD: <omitted> | CLEAN: 0
<- 15:12:13 CONNACK | SESSION_PRESENT: true | RETURN_CODE: 0x0
Exception caught in background thread.  Unable to handle.
['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
Exception caught in background thread.  Unable to handle.
['azure.iot.device.common.transport_exceptions.ConnectionDroppedError: Unexpected disconnection\n']
...
danewalton commented 1 year ago

Okay so it looks like the logs in the pthread implementation aren't getting hit. Are you able to debug into the pthread adapter file and see what is going on when a thread is attempted to be created?

I'm guessing it's some error here: https://github.com/Azure/azure-c-shared-utility/blob/4f1b7cd6bf48833fe4b9e23e2ff07510753faee5/adapters/threadapi_pthreads.c#L61

joranapixa commented 1 year ago

I checked the control flow of the threadapi_pthreads.c adapter file as suggested, and the following happens:

  1. The ThreadAPI_Create gets called correctly
  2. The arguments are valid (it passes the first check)
  3. The malloc for the threadInstance succeeds
  4. pthread_create gets called correctly and returns 0 for create result
  5. The case switch for case 0 gets called, so result should get the value THREADAPI_OK However, the "if (ThreadAPI_Create(&iotHubClientInstance->ThreadHandle, ScheduleWork_Thread, iotHubClientInstance) != THREADAPI_OK)" still gets called in the StartWorkedThreadIfNeeded call. I have not been able to verify whether result from ThreadAPI_Create is actually set to THREADAPI_OK or not (I don't know how to easily print out these macro definitions), but the control flow indicates this should be the case.

It's also not the case that this ThreadAPI_Create gets called twice, this is the only call.

danewalton commented 1 year ago

Okay my hunch here is that there is some kind of incompatibility between files. That the macro in the StartWorkerThreadIfNeeded() is different than the one in the pthread file.

I'll try and explain in depth what I'm thinking.

So the return values for the thread api are defined here:

image

MU_DEFINE_ENUM() then takes those values and adds an _INVALID value as the first value, to make sure we don't reference the integer values and instead use the macro values themselves:

image

So really, all the result values are:

    THREADAPI_INVALID \
    THREADAPI_OK,               \
    THREADAPI_INVALID_ARG,      \
    THREADAPI_NO_MEMORY,        \
    THREADAPI_ERROR

The thread api might be returning THREADAPI_OK, which it believes is 1. But then in the StartWorkerThreadIfNeeded(), it thinks THREADAPI_OK is 0 based on a different age of header file.

If possible, are you able to print the value of that enum in both threadapi_pthread.c and StartWorkerThreadIfNeeded()?

ericwolz commented 1 year ago

pthread_create and if (ThreadAPI_Create(&iotHubClientInstance->ThreadHandle are in separate libraries so there could be a lib mismatch. Looking at the if (ThreadAPI_Create(&iotHubClientInstance->ThreadHandle disassemble and see that the ThreadAPI_Create return value is compared to.

joranapixa commented 1 year ago

Thanks, there seems to be a library mismatch indeed.

I noticed that Deepstream already has it's own libiothub_client.so library present in the libraries folder, which was taking precedence over my own compîled libiothub_client.a file that I added. This implies that Deepstream's libiothub_client.so library probably clashed with my own libaziotsharedutil.a, as you suspect. When trying to link my own libiothub_client.a into the Deepstream application, I get linker issues however. This seems to be a common occurrence for people who try to build their application that's using the Azure IOT C SDK with a Makefile instead of with CMake and I have not found a solution for this yet. I will try to build my application with CMake and I will report whether this resolves my issue.

joranapixa commented 1 year ago

After building my application with CMake, the issue has disappeared.

This verifies that there was an accidental mismatch with using Deepstream's libiothub_client.so instead of using my own compiled libiothub_client.a.

Thank you very much for the help!

For the people that want to build their own Deepstream application with the Azuire IOT SDK using CMake:

danewalton commented 1 year ago

Glad we were able to help. If you have any further questions, please feel free to open another issue!