jianyan-li-qnx commented 4 months ago

Hi team!

My name is Jianyan and I've been working on porting azure-iot-sdk-c LTS_08_2023 to QNX.

I noticed this problem where some test programs would seg fault. Upon further investigation, this is what I think happened:

Take Transport_Fully_Reconnects_After_5_AMQP_device_errors in iothub_client/tests/iothubtransport_amqp_common_ut/iothubtransport_amqp_common_ut.c, in LTS_08_2023, as an example:

The test sets up mocking for STRING_c_str without specifying return value of the mock function, at iothubtransport_amqp_common_ut.c:4465
The test calls IoTHubTransport_AMQP_Common_DoWork at iothubtransport_amqp_common_ut.c:4469
IoTHubTransport_AMQP_Common_Device_DoWork is called at iothubtransport_amqp_common.c:1470
LogError is called at iothubtransport_amqp_common.c:1071 with the argument STRING_c_str(registered_device->device_id) as a %s
- Problem is, mock STRING_c_str returns NULL, so LogError is called with NULL as the %s
LogError invokes consolelogger_log, which invokes vprintf with NULL as a %s .
This is technically an undefined behavior, but works on Linux. On QNX, however, this results in a seg fault.

Please see the full stacktrace below:

#0  __generic_strlen ()
    at /builds/workspace/QNXOS_800/build_aarch64/lib/c/string/aarch64/strlen.S:147
#1  0x00000000780c1edc in _Putfld (px=px@entry=0x100bc8d0, pap=pap@entry=0x100bc890, 
    code=<optimized out>, ac=ac@entry=0x100bca50 "\002")
    at /builds/workspace/QNXOS_800/build_aarch64/lib/c/stdio/xputfld.c:261
#2  0x00000000780c16c4 in _Printf (pfn=<optimized out>, arg=arg@entry=0x78102228 <_Stdout>, 
    fmt=<optimized out>, 
    fmt@entry=0x1056e968 "Failed performing DoWork for device '%s' (device reported state %d; number of previous failures: %lu)", 
    ap_arg=<error reading variable: Cannot access memory at address 0xfffffff8>, 
    secure=secure@entry=0)
    at /builds/workspace/QNXOS_800/build_aarch64/lib/c/stdio/xprintf.c:603
#3  0x00000000780bd1c8 in vprintf (
    fmt=0x1056e968 "Failed performing DoWork for device '%s' (device reported state %d; number of previous failures: %lu)", ap=...)
    at /builds/workspace/QNXOS_800/build_aarch64/lib/c/stdio/vprintf.c:35
#4  0x0000000010564284 in consolelogger_log (log_category=AZ_LOG_ERROR, 
    file=0x1056de20 "/home/jianyli/projects/osg_internal/azure-iot-sdk-c/iothub_client/src/iothubtransport_amqp_common.c", 
    func=0x10570278 <__func__.23> "IoTHubTransport_AMQP_Common_Device_DoWork", line=1071, 
    options=1, 
    format=0x1056e968 "Failed performing DoWork for device '%s' (device reported state %d; number of previous failures: %lu)")
    at /home/jianyli/projects/osg_internal/azure-iot-sdk-c/c-utility/src/consolelogger.c:242
#5  0x00000000105606f8 in IoTHubTransport_AMQP_Common_Device_DoWork (
    registered_device=0x78106910 <__main_arena+14976>)
    at /home/jianyli/projects/osg_internal/azure-iot-sdk-c/iothub_client/src/iothubtransport_amqp_common.c:1071
#6  0x0000000010561628 in IoTHubTransport_AMQP_Common_DoWork (
    handle=0x78107080 <__main_arena+16880>)
    at /home/jianyli/projects/osg_internal/azure-iot-sdk-c/iothub_client/src/iothubtransport_amqp_common.c:1470
#7  0x00000000105594c0 in Transport_Fully_Reconnects_After_5_AMQP_device_errors ()
    at /home/jianyli/projects/osg_internal/azure-iot-sdk-c/iothub_client/tests/iothubtransport_amqp_common_ut/iothubtransport_amqp_common_ut.c:4469
#8  0x00000000780152cc in ?? ()
#9  0x00000000105ffa58 in TestFunctionData262 ()

Please confirm, if possible, that my conclusion is correct.

I am not sure what is the most idiomatic way of solving this issue. I am thinking of sanitizing the string with MU_P_OR_NULL before each call to LogError.

ericwol-msft commented 4 months ago

Probably should have a new func in strings.c that returns a "NULL" string if the parameter is NULL.

jianyan-li-qnx commented 4 months ago

Hi Eric,

Thanks for the quick reply. I have a quick question.

Probably should have a new func in strings.c that returns a "NULL" string if the parameter is NULL.

How is this different/better than using existing macro MU_P_OR_NULL?

ericwol-msft commented 4 months ago

Because one would return a null value and the other is a string. That is why there are quotes around it.

jianyan-li-qnx commented 4 months ago

MU_P_OR_NULL returns a null string too, at deps/azure-macro-utils-c/inc/azure_macro_utils/macro_utils.h: 25

#define MU_P_OR_NULL(p) (((p)!=NULL)?(p):"NULL")

Am I missing something?

ericwol-msft commented 4 months ago

Sorry for the confusion.

STRING_c_str is a mockable function, and MU_P_OR_NULL is a macro. The unit test is tacking the mockable call here. The correct solution is to create a new mockable function that returns a "NULL" string on a NULL arg.

jianyan-li-qnx commented 4 months ago

Thanks for the explanation. My original comment might be confusing too, let me rephrase it: I was talking about using MU_P_OR_NULL to encapsulate calls to STRING_c_str, which would result in something like this: MU_P_OR_NULL(STRING_c_str(registered_device->device_id)). This way, the null value returned by STRING_c_str can be converted to "NULL" and everything would work fine.

Your suggestion, as I understand it, is to create a new mockable function such that:

If the arg (string handle) is not null, return the string wrapped inside the handle
Otherwise, return "NULL"

And then replace calls to STRING_c_str with calls to this new function

This obviously would work. But I can see that it might have a downside. The original STRING_c_str returns a null value which is easy to detect. Now with the new function you have to use strcmp to determine whether the return value of the function represents a null value or not. Moreover, this new function might cause confusion in the edge case where the string handle's string is "NULL" to begin with.

ericwol-msft commented 4 months ago

With a new STRING_c_str you can be more explicate what is NULL. Something like this

const char* STRING_c_str_null_string(STRING_HANDLE handle)
{
    const char* result;
    if (handle != NULL)
    {
        result = ((STRING*)handle)->s;
        if (result == NULL)
        {
            result = "<NULL VALUE>";
        }
    }
    else
    {
        result = "<NULL HANDLE>";
    }
    return result;
}

Moreover, this new function might cause confusion in the edge case where the string handle's string is "NULL" to begin with.

This would more be opt-in than opt-out, so you should know the use case, which should be limited to logging.

jianyan-li-qnx commented 4 months ago

I see, if this function is for logging only then it makes sense to do that and it would solve the problem we have with STRING_c_str.

However, there are some other cases where doing this does not solve the problem. One somewhat frequent problem is that existing code tries to print a null char* instead of a string handle. There are few examples of this, such as https://github.com/Azure/azure-iot-sdk-c/blob/main/serializer/src/schema.c#L2446, https://github.com/Azure/azure-iot-sdk-c/blob/main/serializer/src/iotdevice.c#L305, and https://github.com/Azure/azure-iot-sdk-c/blob/main/iothub_client/src/iothub_client_core.c#L2385

Since these cases print a char instead of a string handle, STRING_c_str_null_string does not work. Perhaps I should use MU_P_OR_NULL to encapsulate the char to solve this problem?

ericwol-msft commented 4 months ago

@jianyan-li-qnx please validate this branch.

2594

jianyan-li-qnx commented 4 months ago

@jianyan-li-qnx please validate this branch.

2594

@ericwol-msft Thanks for the response.

The changes you proposed works on my end.

However I have to also change deps/azure-ctest/src/ctest.c:288 to this to make it fully working, because the test runner is also trying to print NULL:

void char_ptr_ToString(char* string, size_t bufferSize, const char* val)
{
    (void)snprintf(string, bufferSize, "%s", MU_P_OR_NULL(val));
}

void wchar_ptr_ToString(char* string, size_t bufferSize, const wchar_t* val)
{
    (void)snprintf(string, bufferSize, "%ls", MU_WP_OR_NULL(val));
}

ericwol-msft commented 4 months ago

What is the call stack? We can't change anything in the deps/azure-ctest repo.

jianyan-li-qnx commented 4 months ago

What is the call stack? We can't change anything in the deps/azure-ctest repo.

CTEST_ASSERT_ARE_EQUAL is defined as CTEST_ASSERT_ARE_EQUAL, which then calls the _ToString functions
iothub_client/tests/iothub_client_properties_ut/iothub_client_properties_ut.c:1293  ASSERT_ARE_EQUAL(char_ptr, expectedProperty->componentName, actualProperty->componentName); Both componentNames here are null
iothub_client/tests/iothub_client_properties_ut/iothub_client_properties_ut.c:1326 CompareProperties
iothub_client/tests/iothub_client_properties_ut/iothub_client_properties_ut.c:1487 IoTHubClient_Properties_Deserializer_GetNext_three_reported_update_properties => TestDeserializedProperties

ericwol-msft commented 4 months ago

Fixed in PR #2594

Azure / azure-iot-sdk-c

LogError with null as a string (%s) #2587

2594

2594