Xinyuan-LilyGO / T-SIM7600X

126 stars 41 forks source link

Additional characters how to remove Control Characters #101

Closed droidblastnz closed 1 month ago

droidblastnz commented 2 months ago

Posting this to help others in case you have this issue.

Have a remote SIM7600 that I check its output this is one example

09:46:56.718 -> SMS: Skinny Balance Check
09:46:56.718 -> AT+CMGR=0
09:46:57.765 -> message: 
09:46:57.765 -> +CMGR: "REC READ","2424","","24/09/06,09:46:30+48"
09:46:57.765 -> Hi! Your Skinny balance is
09:46:57.765 -> 4.35 credit
09:46:57.765 -> For your account info, text INFO to 2424.
09:46:57.765 -> 
09:46:57.765 -> OK
09:46:57.765 -> 
09:46:57.765 -> Balance not found in the message.
09:46:57.765 -> AT+CMGD=0
09:46:57.811 -> 
09:46:57.811 -> OK
09:46:57.811 -> SMS: Checking for unauthorized messages...
09:46:57.811 -> AT+CMGL="ALL"
09:46:57.859 -> 
09:46:57.859 -> OK
09:46:57.859 -> SMS: Deleting all messages...
09:46:57.859 -> AT+CMGD=0,4

Note 09:46:57.765 -> 4.35 credit

Issue I have had for some time is extra characters appearing and stopping the code...

The control character \u0002, also known as ASCII 2, is called Start of Text (STX). It belongs to the group of non-printable control characters in the ASCII table. Here's a quick breakdown:

ASCII value: 2
Hex value: 0x02
Unicode value: \u0002
Control name: STX (Start of Text)

Purpose:

In older communication protocols, STX was used to indicate the beginning of a text in data transmission. It would signal the receiver that what follows is the actual message content, as opposed to control information or metadata.

However, in modern contexts like SMS parsing, this character is often unnecessary and may appear as noise, as in your case, where it disrupts proper processing. This is why it's commonly removed when handling text data.

So we need to remove these control characters.

String balanceMessage = data.substring(data.indexOf("$"));

// Remove unwanted control characters (e.g., \u0002)
balanceMessage.trim();
balanceMessage.replace("\u0002", "");

// Now, continue with processing
if (balanceMessage.length() > 1) {
    float balance = balanceMessage.substring(1).toFloat(); // Skipping the '$'
    pDBG("Get balance: ");
    pDBGln(balance);
    credit = balance;
    mqtt.publish(topicCredit, String(balance).c_str());
} else {
    pDBGln("Balance not found in the message.");
}

Hope this helps others as its been a pain for some time and think I have now sorted my last issue as my code will find the balance for 2-3 months then not and the logs shows that extra \u0002.

Note this may affect other parts of your code.

@lewisxhe any comments or advice as I do see a lot of extra characters in the serial output at times. e.g.,

08:37:01.726 -> +CIPRXGET: 4,0,0
08:37:01.726 -> 
08:37:01.726 -> OK
08:37:01.772 -> AT+CIPCLOSE?
08:37:01.772 -> 
08:37:01.772 -> +CIPCLOSE: 1,0,0,0,0,0,0,0,0,0
08:37:01.821 -> 
08:37:01.821 -> OK
08:37:01.821 -> AT+CIPSEND=0,2
08:37:01.821 -> 
08:37:01.821 -> >⸮

e.g., 08:37:01.821 -> >⸮

Especially when sending data such as MQTT

08:40:53.189 -> +CIPRXGET: 2,0,28,0
08:40:53.225 -> 0test/PIRonoff111
08:40:53.366 -> Message arrived [test/PIRonoff]: 111
08:40:53.366 -> PIRStatus:1
08:40:53.366 -> Received payload LED: OFF
08:40:53.366 -> AT+CIPSEND=0,33

The character ⸮ that you're seeing in your logs is likely the Unicode "Interrobang" (U+2E2E). However, in the context of your log, it’s more likely that this is a garbled or misinterpreted byte caused by an issue in how serial data is being transmitted or interpreted. Common Causes of Garbled Characters:

Baud rate mismatch: If the sending and receiving devices are not set to the same baud rate, the data can become corrupted, and you'll often see strange characters like ⸮ or others in the output.

Transmission errors: Noise in the communication line or buffer overflow can also cause corruption, leading to unrecognized characters.

Incorrect encoding: The character might be misinterpreted due to a mismatch between the expected character encoding and the actual encoding.

Solution:

Check the baud rate settings on both the sender and receiver sides and ensure they match.
If using a serial communication protocol (e.g., UART), verify that the settings (like parity, stop bits) are correct.
Make sure your buffer size is sufficient to handle the incoming data without overflowing.

These steps should help reduce or eliminate the appearance of garbled characters like ⸮ in your logs.

So my buffer is #define TINY_GSM_RX_BUFFER 1024

Baud is 115200 8N1 which is correct..

@lewisxhe is this sufficient or what can it be increased to?

#define TINY_GSM_RX_BUFFER 2048

lewisxhe commented 2 months ago

This kind of garbage characters is more likely caused by hardware power glitch. Try to reduce the communication baud rate to 9600. If you have another PCIE module, you can try to replace the module first.

droidblastnz commented 2 months ago

This kind of garbage characters is more likely caused by hardware power glitch. Try to reduce the communication baud rate to 9600. If you have another PCIE module, you can try to replace the module first.

RX buffer size if I dont declare it what is the default size? What is the preferred and or max size?

lewisxhe commented 2 months ago

https://github.com/espressif/arduino-esp32/blob/cbe0f2ff0dd772edaf6aabfa4cc018021c0e3364/cores/esp32/HardwareSerial.cpp#L99

droidblastnz commented 2 months ago

https://github.com/espressif/arduino-esp32/blob/cbe0f2ff0dd772edaf6aabfa4cc018021c0e3364/cores/esp32/HardwareSerial.cpp#L99

thanks, so 256.

[HardwareSerial::HardwareSerial(uint8_t uart_nr)
  : _uart_nr(uart_nr), _uart(NULL), _rxBufferSize(256), _txBufferSize(0), _onReceiveCB(NULL), _onReceiveErrorCB(NULL), _onReceiveTimeout(false), _rxTimeout(1),
    _rxFIFOFull(0), _eventTask(NULL)
#if !CONFIG_DISABLE_HAL_LOCKS](url)

I have rewritten the code to remove any additional control or noise characters and now testing. Issue is very minor but happens after 2-3 days or longer. No sure how else to resolve it.

// Function to extract the numeric value (credit) from the message
float extractCreditValue(const String& data) {
    String balanceMessage = "";

    // Look for the first digit in the message to start extracting the balance
    bool numberStarted = false;
    for (int i = 0; i < data.length(); i++) {
        char c = data.charAt(i);

        // Once a digit is found, start building the balance string
        if (isDigit(c) || (numberStarted && c == '.')) {
            numberStarted = true;
            balanceMessage += c;
        }
        // Stop collecting once non-numeric characters appear after the number has started
        else if (numberStarted) {
            break;
        }
    }

    // Convert the balance string to a float and return it
    if (balanceMessage.length() > 0) {
        return balanceMessage.toFloat();
    } else {
        return -1.0;  // Return an error value if no numeric value was found
    }
}

// Example usage
String message = "Hi! Your Skinny balance is 4.35 credit";
float credit = extractCreditValue(message);

if (credit != -1.0) {
    pDBG("Get balance: ");
    pDBGln(credit);
    mqtt.publish(topicCredit, String(credit).c_str());  // Publish the balance
} else {
    pDBGln("Balance not found in the message.");
}

Revised code that focuses on extracting and returning only the numeric credit value, regardless of whether there is a dollar sign ($) or not. It searches for the first numeric sequence and extracts that as the balance.