ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.67k stars 2.98k forks source link

AT parser sometimes times out even though stop tag received #8829

Closed Roger-Strassburg closed 5 years ago

Roger-Strassburg commented 5 years ago

Description

The current AT parser from mbed sometimes gets AT timeouts in spite of receiving the stop tag within the timeout period.

Here's the code for processing the response data from the AT^SMONP=255 command on the Cinterion EHS5:

while(at_handler.info_resp())
{
    // do stuff with the data received
}
at_handler.resp_stop();

Here is a trace of the response received when there's a timeout (wrapped for readability):

\r\n3G:\r\n10588,428,-6.5,-111,---,---,AN,65425\r\n10564,504,-8.0,-113,---,---,AN,6542 3\r\n3087,63,-9.0,-99,---,---,AN,------\r\n10588,346,-10.0,-117,---,---,AN,------\r\n1 0588,504,-12.5,-121,---,---,AN,------\r\n10588,352,-12.5,-121,---,---,AN,------\r\n1 0588,350,-12.5,-121,---,---,AN,------\r\n10588,503,-12.5,-121,---,---,AN,------\r\n2 G:\r\n69,----,0,7,0,------\r\n71,----,0,0,0,------\r\n73,----,0,4,0,------\r\n78,----,0, 1,0,------\r\n79,----,0,5,0,------\r\n79,----,4,2,0,------\r\n\r\nOK\r\n

As you can see, the "OK" string is present, but it's not found.

This only results in a timeout some of the time, but when I've seen it it's always been when iterating through an unknown number of lines of response data until terminated by either an "OK" or error.

What I see in the parser code is that the resp() routine in ATHandler attempts to match the "OK" string without waiting for all the full string to be in the buffer. Waiting could result in timeouts that might occur when there's an intermediate response, as is typically the case when reading packets from the modem.

The read() routine in ATHandler will stop reading and indicate that the response is complete once it is found. Unfortunately, for info response mode the stop tag is "\r\n" so that it stops at the end of each line. In that case it doesn't check for the "OK" string, meaning that if the resp() routine didn't catch it, the read() routine won't, either, resulting in the parser reading messages from the modem until the timeout occurs, possibly resulting in the driver getting out of sync with the modem.

To fix this, the read(), consume_to_tag() and skip_param() routines need to also check for the "OK" string (_resp_stop) before checking for stop_tag.

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug
ciarmcom commented 5 years ago

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-188

ghost commented 5 years ago

Simulated the above while loop in an unit test bench with the above test string, and the ATHandler finds the stop tag (OK\r\n) from the string. resp_stop() finishes successfully.

The read() routine just fills the buffer, it does not check contents of data.

The problem here is more likely that the timeout counting starts when ATHandler lock() is called, and if in the above while loop lock() is not called, as is most likely the case, then the while loop has to be finished in less than the total timeout available. So may be the solution here is just to increase the timeout length. @Roger-Strassburg is it possible to retry with a longer timeout?

ghseb commented 5 years ago

Hello, @TeemuKultala we've got some information on the issue. To us it seems that it is not related to timeouts that are to short.

Some suggestions about causes:

The following snippet at least shows that there can be AT-Timeouts without actually having a timeout.

#include "ATHandler.h"
#include "Thread.h"
#include "EventQueue.h"
#include "Callback.h"
#include <string>
#include <algorithm>
#include <array>

class TestFileHandle : public mbed::FileHandle
{
public:
    TestFileHandle() : _count { 0 } { }

    virtual ssize_t read(void *buffer, size_t size) override
    {
        if(_count >= ARRAY_SIZE)
        {
            return 0;
        }

        const char *part = _parts[_count];
        const size_t len = strlen(part);
        memcpy(buffer, part, len);
        ++_count;
        return len;
    }

    virtual ssize_t write(const void *buffer, size_t size) override { return size; }
    virtual off_t seek(off_t offset, int whence = SEEK_SET) override { return offset; }
    virtual int close() override { return 0; }

private:
    uint16_t _count;
    static const size_t ARRAY_SIZE = 3;
    static std::array<const char*, ARRAY_SIZE> _parts;
};

std::array<const char*, TestFileHandle::ARRAY_SIZE> TestFileHandle::_parts = {
        "42\r\n" ,
        "\r\n",
        "OK\r\n"
};

int main()
{
    events::EventQueue q;
    TestFileHandle fh;
    ATHandler at(&fh, q, 1000, "\r");
    at.cmd_start("");
    at.cmd_stop();
    at.resp_start();
    while(at.info_resp())
    {
        const int number = at.read_int();
        printf("Number: %d\r\n", number);
    }
    at.resp_stop();
    while(1);
}

Output:


42
Number: 42

Number: -1
OK
Number: 0
2019-02-14 13:09:02 [WARN][CELL]: AT timeout
Number: -1
2019-02-14 13:09:02 [WARN][CELL]: AT timeout
2019-02-14 13:09:02 [WARN][CELL]: AT stop tag not found
2019-02-14 13:09:02 [WARN][CELL]: AT timeout
2019-02-14 13:09:02 [WARN][CELL]: AT stop tag not found
mirelachirica commented 5 years ago

Thank you for the snippet! It was very usefull to be able to reproduce the issue.

ATHandler might not be able to properly handle info responses with no prefix (Prefix is "")

Info responses with empty prefix should not be a match. There was a bug for info_resp scope that would match even empty prefix, due to checking on NULL pointer while _info_resp_prefix is defined as char array so never NULL.

I'm proposing following: https://github.com/mirelachirica/mbed-os/commit/e83a744e8fa0e76c9a78be63eeb4a8f73e23f210

As a result, in your use case, the response to the AT^SMONP=255 command should be parsed using read routines within resp scope.

What do you think?

ghseb commented 5 years ago

Yes that could help. Additionally it might be required to add the length-check on the following lines too: https://github.com/mirelachirica/mbed-os/commit/e83a744e8fa0e76c9a78be63eeb4a8f73e23f210#diff-bda508ccd4b3e3e5909446a4f3f88716R856 https://github.com/mirelachirica/mbed-os/commit/e83a744e8fa0e76c9a78be63eeb4a8f73e23f210#diff-bda508ccd4b3e3e5909446a4f3f88716L863

Something like this:

        if (match_error()) {
            _error_found = true;
            return;
        }

        if (prefix && strlen(prefix) && match(prefix, strlen(prefix))) {
            _prefix_matched = true;
            return;
        }

        if (check_urc && match_urc()) {
            _urc_matched = true;
            clear_error();
            continue;
        }

        // If no match found, look for CRLF and consume everything up to and including CRLF
        if (mem_str(_recv_buff, _recv_len, CRLF, CRLF_LENGTH)) {
            // If no prefix, return on CRLF - means data to read
            if (!prefix || strlen(prefix) == 0) {
                return;
            }
            consume_to_tag(CRLF, true);
        } else {
            // If no prefix, no CRLF and no more chance to match for OK, ERROR or URC(since max resp length is already in buffer)
            // return so data could be read
            if ((!prefix || strlen(prefix) == 0 ) && ((_recv_len - _recv_pos) >= _max_resp_length)) {
                return;
            }
            if (!fill_buffer()) {
                // if we don't get any match and no data within timeout, set an error to indicate need for recovery
                set_error(NSAPI_ERROR_DEVICE_ERROR);
            }
        }
mirelachirica commented 5 years ago

Yes, of course those checks are needed everywhere prefix is checked for NULL.

0xc0170 commented 5 years ago

https://github.com/ARMmbed/mbed-os/pull/10084 should fix this