ENSTABretagneRobotics / Hardware-MATLAB

8 stars 2 forks source link

Multiple Slamtec LIDAR Connection Issues with MATLAB #2

Open ismet55555 opened 4 years ago

ismet55555 commented 4 years ago

I'm running into some initial LIDAR connection issue with simultaneously connecting 4 Slamtec RPLIDAR A3 using this MATALB interface llibrary.

The issue is that I am having to retry the connection on at least one of the LIDARs before it connects. And it can also vary with LIDAR that is. That is, all but one LIDAR connects the first time. One time, it could be LIDAR on one COM port, another time it could be a LIDAR on another COM port.

This is the way it is set up right now. Basically MATALB loads the provided interface library, hardwarex.dll. That exposes some library methods to be used by MATLAB.

The method to connect the LIDAR does the following:

  1. Opens the RS232 port
  2. Sets port options
  3. Gets some info and health statuses form lidar
  4. Sets the motor PWM to zero (stop lidar motor)
  5. Uses express scan mode option

Here somewhere the communication will error out. Using a serial sniffer I was able to see that the LIDAR errors out after the following message to the LIDAR:

a5 f0 02   ff 03 ab   a5 25   a5 82 05 00 00 00 00 00 22

Which I tracked to the following library methods, in that order

SetMotorPWMRequestRPLIDAR()
CheckMotorControlSupportRequestRPLIDAR()
StopRequestRPLIDAR()
StartExpressScanRequestRPLIDAR()   <-- Error here

To which the LIDAR responds with:

a5 5a 54 00 00 40 82 

Where as a successfully connection response from the LIDAR much longer in content.


Things I've tried


This is the system and settings I am working with

Lidar (x4):

Computer

Serial Port Settings

Software

lebarsfa commented 4 years ago

Thank you for your detailed report. Here are some info, ideas, suggestions, questions:

Where as a successfully connection response from the LIDAR much longer in content

?

ismet55555 commented 4 years ago

Hey, first of all thank you very much for your quick and thorough response! Also, this entire repo has been amazing help for us here.

I just saw your response, however I think we have gotten it to work and it may be something to consider and change.

To answer your questions first

Maybe it would be ok to keep this up, so others can reference it if they come to the same issue.

As I mentioned when debugging, the error seemed to happen ConnectRPLIDAR() --> StartExpressScanRequestRPLIDAR(), then specifically here:

    // Receive the first data response (2 data responses needed for angles computation...).
    memset(pRPLIDAR->esdata_prev, 0, sizeof(pRPLIDAR->esdata_prev));
    if (ReadAllRS232Port(&pRPLIDAR->RS232Port, pRPLIDAR->esdata_prev, sizeof(pRPLIDAR->esdata_prev)) != EXIT_SUCCESS)
    {
        // Failure
        printf("A RPLIDAR is not responding correctly. \n");
        return EXIT_FAILURE;
    }

What seemed to have happened before that is after the command being send out in WriteAllRS232Port(), sometimes it would not read a response in the ReadAllRS232Port(), esdata_prev would be nothing.

We tried implementing a mSleep(500) delay before that second ReadAllRS232Port(), and it seemed to help (my guess that the lidar was slow to respond), but the issue did not get resolved fully.

The following is what made it work every time with 4 lidars:

inline int StartExpressScanRequestRPLIDAR(RPLIDAR* pRPLIDAR)
{
    unsigned char reqbuf[] = { START_FLAG1_RPLIDAR,EXPRESS_SCAN_REQUEST_RPLIDAR,0x05,0,0,0,0,0,0x22 };
    unsigned char descbuf[7];
    unsigned char sync = 0;
    unsigned char ChkSum = 0;

    // Send request to output/tx OS FIFO buffer for port
    if (WriteAllRS232Port(&pRPLIDAR->RS232Port, reqbuf, sizeof(reqbuf)) != EXIT_SUCCESS)
    {
        printf("Error writing data to a RPLIDAR. \n");
        return EXIT_FAILURE;
    }

    // Receive the response descriptor.
    memset(descbuf, 0, sizeof(descbuf));  // Alocate memory
    if (ReadAllRS232Port(&pRPLIDAR->RS232Port, descbuf, sizeof(descbuf)) != EXIT_SUCCESS)
    {
        printf("A RPLIDAR is not responding correctly. \n");
        return EXIT_FAILURE;
    }

    // Quick check of the response descriptor.
    if ((descbuf[2] != 0x54) || (descbuf[5] != 0x40) || (descbuf[6] != MEASUREMENT_CAPSULED_RESPONSE_RPLIDAR))
    {
        printf("A RPLIDAR is not responding correctly. \n");
        return EXIT_FAILURE;
    }

    // Keep anticipating a port read buffer for 1 second
    int timeout = 1500;

    // Check it every 5 ms
    // Note on Checking Period Value:
    //    Waiting on 82 bytes in lidar payload
    //    10 bits per byte for the serial communication
    //    820 bits / 256000 baud = 0.0032s = 3.2ms
    int checkingperiod = 5;

    RS232PORT* pRS232Port = &pRPLIDAR->RS232Port;
    int i;
    int count = 0;

    // Wait for something to show up on the input buffer on port
    if (!WaitForRS232Port(&pRPLIDAR->RS232Port, timeout, checkingperiod))
    {
        //Success - Something is there

        // If anything is on the input buffer, wait until there is enough
        count = 0;
        for (i = 0; i < 50; i++)
        {
            // Check the input FIFO buffer on the port
            GetFIFOComputerRS232Port(pRS232Port->hDev, &count);

            // Check if there is enough to get a full payload read
            if (count >= sizeof(pRPLIDAR->esdata_prev))
            {
                // Thre is enough, stop waiting
                break;
            }
            else
            {
                // Not enough, wait a little
                mSleep(checkingperiod);
            }
        }

    }
    else
    {
        //Failure - After waiting for an input buffer, it wasn't there
        printf("[StartExpressScanRequestRPLIDAR] : Failed to detect response on the input FIFO buffer. \n");
        return EXIT_FAILURE;
    }

    // Receive the first data response (2 data responses needed for angles computation...).
    memset(pRPLIDAR->esdata_prev, 0, sizeof(pRPLIDAR->esdata_prev));
    if (ReadAllRS232Port(&pRPLIDAR->RS232Port, pRPLIDAR->esdata_prev, sizeof(pRPLIDAR->esdata_prev)) != EXIT_SUCCESS)
    {
        // Failure
        printf("A RPLIDAR is not responding correctly. \n");
        return EXIT_FAILURE;
    }

    // Analyze the first data response.
    sync = (pRPLIDAR->esdata_prev[0] & 0xF0)|(pRPLIDAR->esdata_prev[1]>>4);
    if (sync != START_FLAG1_RPLIDAR)
    { 
        printf("A RPLIDAR is not responding correctly : Bad sync1 or sync2. \n");
        return EXIT_FAILURE;    
    }

    ChkSum = (pRPLIDAR->esdata_prev[1]<<4)|(pRPLIDAR->esdata_prev[0] & 0x0F);
    // Force ComputeChecksumRPLIDAR() to compute until the last byte...
    if (ChkSum != ComputeChecksumRPLIDAR(pRPLIDAR->esdata_prev+2, sizeof(pRPLIDAR->esdata_prev)-1))
    { 
        printf("A RPLIDAR is not responding correctly : Bad ChkSum. \n");
        return EXIT_FAILURE;    
    }

    return EXIT_SUCCESS;
}

So in the above code, we are waiting for the OS read FIFO buffer to show something within 1.5s, checking every 5ms (WaitForRS232Port()). If anything shows up, makes sure to wait to have enough, the size of the payload (GetFIFOComputerRS232Port()).

I'm not sure if it made a difference but we also removed the OS write FIFO buffer by changing it from 0 to FILE_FLAG_NO_BUFFERING:

File: OSComputerRS232Port.h
...
    hDev = CreateFile( 
        tstr,
        GENERIC_READ|GENERIC_WRITE,
        0, // Must be opened with exclusive-access.
        NULL, // No security attributes.
        OPEN_EXISTING, // Must use OPEN_EXISTING.
        FILE_FLAG_NO_BUFFERING, // Not overlapped I/O. Should use FILE_FLAG_WRITE_THROUGH and maybe also FILE_FLAG_NO_BUFFERING?
        NULL // hTemplate must be NULL for comm devices.
        );
...

I'm definitely not a C++ ninja, but this seems to consistently work. If you have any suggestions, I would love to hear them!

lebarsfa commented 4 years ago

Thank you very much for sharing your fix!

So it seems to be a timeout problem...

In the configuration file RPLIDAR0.txt, there is a timeout parameter which is 1000 ms by default (you mentioned in your first message that your serial timeout was also 1000). From your fix, I would say you expect the problematic read operation might take up to 1500 ms to get the first byte, then we might have to wait up to 50x5 ms for the rest of the bytes, but in practice the mSleep() function relies on the Windows API Sleep() function which is typically with a resolution of around 16 ms (see e.g. https://stackoverflow.com/questions/9518106/winapi-sleep-function-call-sleeps-for-longer-than-expected) so it is probably up to 50x16=800 ms.

From that, I wonder if just changing the timeout parameter in RPLIDAR0.txt to something like 2500 could also solve the problem without changing anything in the C++ code. If this does not work, I would highly expect that adding WaitForRS232Port(&pRPLIDAR->RS232Port, 1500, 5) (or even just mSleep(1500), or with a bit more than 1500) alone is the real fix (the ReadAllRS232Port() is supposed to already do something quite equivalent to the loop with GetFIFOComputerRS232Port() assuming the RPLIDAR0.txt timeout is 1000)...

ismet55555 commented 4 years ago

I see what you are staying. Currently we are using one configuration file for each LIDAR, and each of which will set the port timeout parameter to 1000ms. I did try to mess with it and set it to 1250ms, and I didn't see much difference, but now to think of it, maybe it was not enouch.

Another thing to point out that makes these delays tricky with 4 lidars is the port os buffer overflow.

So let's say you have the first LIDAR trying to connect with your MATLAB script. The library does its thing and connects and everything in about 1.8s. Now the second LIDAR tries to do the same thing within the same amount of time. In the meantime, no data has been read for the first LIDAR (no data consumed in MATLAB), potentially overflowing the FIFO buffer for that first LIDAR. And if you do this for all 4 LIDARS, those delays matter. So we tried to keep delays to a minimum specifically during the connection process.

Our workaround was to purge all connected ports for all LIDARs right after connection and everywhere in MATLAB before all 4 LIDARS are connected and ready and ready to receive real data.

I hope that makes sense.

lebarsfa commented 4 years ago

OK, I understand the additional problem with the serial buffer overflow.

An alternate MATLAB-only workaround could be to call the ConnectXXX() functions in a parfor loop (assuming that each device takes a similar time to start, or it could be the whole test_rplidar.m code modified to save data in a file and have a fifth parallel job that reads the latest lines of the files to do whatever we want with the data of each device), but this would need Parallel Computing Toolbox (without it parfor will only run as a normal for loop in reverse order, the maximum number of jobs might be also limited to the number of cores of the CPU, see https://fr.mathworks.com/help/parallel-computing/parallel-preferences.html), here is a sample of a parfor loop:

function test
    function parallel_job(x)
        pause(x)
        disp(x)
    end

    Funcs = {@(x) parallel_job(x), @(x) parallel_job(x+1), @(x) parallel_job(x+2), @(x) parallel_job(x+3)}; 
    x = 1:4; %N number of different inputs
    parfor j = 1:4 %Have each worker process a function in parallel
        Funcs{j}(x(j));
    end
end

But the best would be that I update the C++ code so that independent threads are created internally for each device...

ismet55555 commented 4 years ago

Yeah, that's a good idea. Thank you for the suggestion and the code snippet.

We have actually thought about using the parfor loop a while back, which right now seems like a good idea to make sure the buffer doesn't overrun.

What I was thinking is to just use the parfor loop for the connection process, then merge back to the main process for the rest of the program. By the way, for the rest of the program for the process to be separate is problematic the way we have it set up.

ismet55555 commented 4 years ago

Hey just by the way, something that really helped us out was to seperate the connection of the lidar from the scan request.

That is, remove the following section and make it its own endpoint to be called by MATLAB as (ie. HARDWAREX_API int StartExpressScanRequestRPLIDARx(RPLIDAR* pRPLIDAR);)

    // Start scan request (Takes the longest typically)
    memset(pRPLIDAR->esdata_prev, 0, sizeof(pRPLIDAR->esdata_prev));
    switch (pRPLIDAR->ScanMode)
    {
    case SCAN_MODE_RPLIDAR:
        if (StartScanRequestRPLIDAR(pRPLIDAR) != EXIT_SUCCESS)
        {
            printf("Unable to connect to a RPLIDAR : SCAN failure.\n");
            SetMotorPWMRequestRPLIDAR(pRPLIDAR, 0);
            CloseRS232Port(&pRPLIDAR->RS232Port);
            return EXIT_FAILURE;
        }
        break;
    case EXPRESS_SCAN_MODE_RPLIDAR:
        if (StartExpressScanRequestRPLIDAR(pRPLIDAR) != EXIT_SUCCESS)
        {
            printf("Unable to connect to a RPLIDAR : EXPRESS_SCAN failure.\n");
            SetMotorPWMRequestRPLIDAR(pRPLIDAR, 0);  // Stop Lidar
            CloseRS232Port(&pRPLIDAR->RS232Port);    // Close the port
            return EXIT_FAILURE;
        }
        break;
    case FORCE_SCAN_MODE_RPLIDAR:
        if (StartForceScanRequestRPLIDAR(pRPLIDAR) != EXIT_SUCCESS)
        {
            printf("Unable to connect to a RPLIDAR : FORCE_SCAN failure.\n");
            SetMotorPWMRequestRPLIDAR(pRPLIDAR, 0);
            CloseRS232Port(&pRPLIDAR->RS232Port);
            return EXIT_FAILURE;
        }
        break;
    default:
        if (StartScanRequestRPLIDAR(pRPLIDAR) != EXIT_SUCCESS)
        {
            printf("Unable to connect to a RPLIDAR : SCAN failure.\n");
            SetMotorPWMRequestRPLIDAR(pRPLIDAR, 0);
            CloseRS232Port(&pRPLIDAR->RS232Port);
            return EXIT_FAILURE;
        }
        break;
    }

This way, when using multiple LIDARS, you can connect all LIDARS using ConnectRPLIDAR, set everything up, then request the scan with StartExpressScanRequestRPLIDAR, and quickly jump into DAQ.

As opposed to initially, the remaining LIDARS have to wait for the connection (~0.6s) and scan request (~1s). This way the remaining LIDARS only accumulate the read FIFO for ~1s.

ismet55555 commented 4 years ago

Question. Have you ever encountered the issue where a LIDAR needs to be unplugged and plugged back into the USB port? Where it will not even bind to the port when trying to connect.

Further, if you are using the simple frame_grabber.exe provided by SLAMTEC, the message of Cannot bind to port appears for the locked up LIDAR.

Resetting the port or even restarting the computer, does not get it out of its locked state until the LIDAR is unplugged and plugged back in. What do you think the issue there is and what would you suggest to do about that?

lebarsfa commented 4 years ago

OK, thank you for the suggestion, I will consider integrating it next time I update this.

I have not used personally a RPLIDAR since quite a long time, maybe I got also that error message but did not try to understand why since this can probably happen for a vast number of reasons. From what you describe, I would say it could be the device itself that has its firmware crashing, where only cutting the power of its USB port would reset the crashing part of the firmware. The power from the USB ports might not be cut when we restart, while it might be cut when we shutdown the computer, also some computers might have 1 (or sometimes all) of their USB port that is always powered even when shut down (e.g. for the purpose of charging smartphones), and all of this can be probably configured in the BIOS or other computer-specific tool.... Also the fast startup option of Windows might change the behavior of a shut down (when enabled, shutting down might be a kind of "half hibernation/sleep" so not everything might be reset...)...

lebarsfa commented 4 years ago

I added the option bStartScanModeAtStartup in RPLIDAR0.txt and related functions in MATLAB, now it should be possible to do manually what was suggested in https://github.com/ENSTABretagneRobotics/Hardware-MATLAB/issues/2#issuecomment-639105544. Additionally, now the functions use SLAMTEC SDK. Also, I noticed I got several BSOD when using my RPLIDAR A3, I will see if upgrading the driver from https://www.silabs.com/products/development-tools/software/usb-to-uart-bridge-vcp-drivers will help...

ismet55555 commented 4 years ago

Oh nice! Very helpful indeed. Thanks! Do you know if all this would work for the RPLIDAR S1? I've been trying to port our lidar code from the A3 to S1, however, I'm getting some odd readings. Seems to be working seamlessly with the SLAMTEC frame grabber application. Looking at the SDK frame grabber source code, it's hard for me to navigate from what you put together and their SDK source code.

lebarsfa commented 4 years ago

Since now the MATLAB code calls SLAMTEC SDK functions internally, there might be a chance that it works with RPLIDAR S1 if it supports standard scan or express scan modes...