fiorix / go-smpp

SMPP 3.4 Protocol for the Go programming language
MIT License
218 stars 135 forks source link

Use GSM(packed) to encode and then decode, there is an extra '@' character #108

Open JemmyH opened 1 year ago

JemmyH commented 1 year ago

Question

GSM7 (Packed), encode first, and then decode the encoded result, which is inconsistent with the original input, and there are more '@' characters

For example

My source input was

"1234567890abcdefghijklm"

Firstly I encoded it to a bytes slice m

m := []byte{49,217,140,86,179,221,112,57,88,88,60,38,151,205,103,116,90,189,102,183,1}

Then I Decoded the m, but got

"1234567890abcdefghijklm@"

There was one more character '@' than the original input.

image

Here is my test codes:

func TestEncode(t *testing.T) {
    content := "1234567890abcdefghijklm"
    t.Logf("original content: %s, length: %d", strconv.Quote(content), len(content))

    // encode the content with packed option
    encoder := GSM7(true).NewEncoder()
    es, _, err := transform.Bytes(encoder, []byte(content))
    assert.Nil(t, err)
    t.Logf("after encoded. bytes: %v, length: %d", es, len(es))

    // decode `es`
    decoder := GSM7(true).NewDecoder()
    res, _, err := transform.Bytes(decoder, es)
    assert.Nil(t, err)
    t.Logf("after decode. content: %s, length: %d", strconv.Quote(string(res)), len(res))
}
JemmyH commented 1 year ago

There is another question. As mentioned in GSM 03.38, if the first 7 bits of the last byte are all 0 after packing, a CR(0x0d) should be filled to the last byte to avoid confusion with @.

When there are 7 spare bits in the last octet of a message, these bits are set to the 7-bit code of the CR control (also used as a padding filler) instead of being set to zero (where they would be confused with the 7-bit code of an '@' character).

For example, the source input is "1234567890abcdefghijklm". After encoding and packing, we will get

m := []byte{49,217,140,86,179,221,112,57,88,88,60,38,151,205,103,116,90,189,102,183,1}

the last byte is '1', 0000 0001, which matches the scenario mentioned in the above article. So a CR(0x0d) should be filled to it, 1 | (0x0d << 1) = 27, as 0001 1011. Then the new encode result should be:

m1 := []byte{49,217,140,86,179,221,112,57,88,88,60,38,151,205,103,116,90,189,102,183,27}

But when I tried to decode m1, I got

"1234567890abcdefghijklm\r"

Yes, the extra characters become '\r'.

JemmyH commented 1 year ago

After my deduction, when the encoded length of the original input satisfies the arithmetic sequence $a_n=8*n-1$ , the above situation will occur.

For example:

JemmyH commented 1 year ago

here it is the fix: https://github.com/fiorix/go-smpp/pull/109

JemmyH commented 1 year ago

@fiorix