REVrobotics / node-rhsplib

Node wrapper for RHSPLib
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

librhsp getInterfacePacketID errors with segmentation fault. #133

Open felipetrentin opened 2 months ago

felipetrentin commented 2 months ago

My configuration:

I was able to connect to the hub and change the LED color, by using rhsp_setModuleLedColor by using the library tests as an example.

The problem:

rhsp_setServoConfiguration, rhsp_setMotorChannelMode, rhsp_getEncoderPosition and other commands give the same segmentation fault.

it appears as any command containing rhsp_getInterfacePacketID creates the same behavior, apparently caused by the strcmp in getInterfaceByName

debugging in GDB i get the following output:

Thread 1 "main" received signal SIGSEGV, Segmentation fault.
__strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116
116 ../sysdeps/x86_64/multiarch/strcmp-avx2.S: No such file or directory.

Thanks in advance.

felipetrentin commented 1 month ago

update, I found the bug and was able to fix it

(but probably in not a very good way lol)

it turns out that ´rhsp_getInterfacePacketID´ does a transaction with the hub in the following order:

  1. test if requested interface is in the interfaces list already, 1.1. if the interface is already in the computer memory, return it. 1.2. if the interface is not in the computer, query it from the hub and store it on the memory
  2. test again if the requested interface is in the computer memory and return it.

the bug:

the code was failing in step 2 where it would check if there was an interface after querying. more especifically, the code was failing in getInterfaceByName where there is a string comparison. This could mean a couple of things:

First of all, in getInterfaceByName there is a comparison between int and size_t that could yield an infinite loop. Thus, i changed the function to be:

static const RhspModuleInterface* getInterfaceByName(const RhspModuleInterfaceListInternal* list,
                                                     const char* interfaceName)
{
    for (size_t i = 0; i < list->numberOfInterfaces; i++)
    {
        if (strcmp(interfaceName, list->interfaces[i].name) == 0)
        {
            return &list->interfaces[i];
        }
    }
    return NULL;
}

take note that if the value of list->numberOfInterfaces is wrong, the loop will try accessing an illegal part of the memory.

as the bug was still not fixed, I focused in understanding what was happening inside rhsp_queryInterface, since a wrong value inside this function could lead to an invalid data structure that would in turn fail when processed by getInterfacePacketIdInternal

my first suspicion was that some string could be missing a null terminator, causing string operators to fail. thus, i added this line after the allocation of memory to intf_.name and memcpy

intf_.name[interfaceNameLength-1] = 0;

this also did not fix the issue, despite not breaking anything else.

Therefore, the only other suspicion was that some garbage value was being stored in list->numberOfInterfaces resulting in a out of bounds memory acces in getInterfaceByName. The question was "where is this value set and changed, and what could make it out of bounds?"

then I turned my focus to rhsp_queryInterface where this value probably was last changed before failing. Upon further inspection, I found that it called addInterface

as I had already confirmed that the values set before the calling of this function were right, it was the main culprit for the segmentation fault. It turns out, that if hub->interfaceList is a null pointer, the function allocates its memory but never changes the values. The code probably worked using other compilers, but running Colcon and, in the background, GCC, makes such that malloc only allocates memory and does not clean it.

so it was possible that numberOfInterfaces was not zero after allocation.

so, to test if that was the case i added the following line after the allocation:

((RhspModuleInterfaceListInternal*) hub->interfaceList)->numberOfInterfaces = 0;

And it worked!! but the code is still unsafe. There are a lot of arguments that could result in big accidents like these. This is still not good enough for a competition-ready robot running ROS2. I hope this brought some light to the issue and maybe helped someone. I would be interested in contributing to this repository. I would also suggest to make librhsp separate from this repository, since it has very different goals in mind.

NoahAndrews commented 1 month ago

Wow! Thank you for reporting and investigating this bug. We're not actively working on this library at the moment, but we do plan to return to it, and I'm sure we'll want to dive into this then.

Moving librhsp to its own repo is something we want to do at some point (there's some historical reasons why that's not currently the case).