felixwrt / sml-rs

Smart Message Language parser written in Rust
Other
11 stars 3 forks source link

Decoder improvements #6

Closed fkohlgrueber closed 1 year ago

fkohlgrueber commented 1 year ago

@kegesch, @torfmaster want to have a look?

torfmaster commented 1 year ago

Great! I'll definitely have a look but it already looks useful for me. Did you also implement higher layers or did you focus on the transport layer?

fkohlgrueber commented 1 year ago

I did a proof of concept implementation for the application layer before which you can find on the "old" branch in this repo. I then started again to do it properly and spent way too much time on the transport layer. Anyway, with the decoding part being in a pretty good state now I could continue working on the application layer again. The long term goal of this library is definitely to handle both layers.

fkohlgrueber commented 1 year ago

I've been thinking about the LendingIterator trait I've been using. As it is currently, it doesn't make much sense as there's only one type implementing the trait and the trait doesn't even provide more functionality (like std::iter::Iterator does). I'm thinking that it'd make more sense to replace the trait and it's impl with a regular function next in the DecodeIterator type. This way, one doesn't have to import the trait and it would also lower the MSRV as this doesn't require GATs. Any thoughts on this?

torfmaster commented 1 year ago

Though a bit late to the party: I have been hoping for solution for the problem I had during implementtion that the escape sequences can be part of a string and so incomplete messages can be picked up. But I guess you will have to take the complete message and check the crc (which will be unlikely to clash). Did you think about this problem?

fkohlgrueber commented 1 year ago

This shouldn't be a problem. The SML transport requires that escape sequences (1b1b1b1b) in the user data are escaped by the encoder. The decoder can then "unescape" them correctly. For example, if the user data would contain bytes that marked the end of a transmission (1b1b1b1b1aXXYYZZ), the encoder would append another escape sequence (1b1b1b1b) after processing the escape sequence in the user data. The user data above would be encoded as 1b1b1b1b1b1b1b1b1aXXYYZZ. The decoder knows that 1b1b1b1b1b1b1b1b means that an escape sequence was present in the user data and can correctly revcover the original user data (1b1b1b1b1aXXYYZZ).

It's the same principle used for string literals in many programming languages. There, the backslash (\) is used as the escape sequence (such that you can use \n for a newline character). If you want a backslash character to be part of the string, you need to escape it with another backslash as the escape character (\\).

Does this answer your question?

torfmaster commented 1 year ago

Ah, I see, I didn't know that. But the fact that the escaping is prepended (as usual) implies that the issue I described can still occur. I guess the main difference between the TLV structure in SML and programming languages is that in programming languages the structures are balanced (like parentheses, quotes, ...). Another question: is the escaping happening in the transport layer or above. So is 1b 1b 1b 1b 1a 00 xx xx encoded as 0x0c 1b 1b 1b 1b 1b 1b 1b 1b 1a 00 xx xx or as 0x09 1b 1b 1b 1b 1b 1b 1b 1b 1a 00 xx xx?

fkohlgrueber commented 1 year ago

But the fact that the escaping is prepended (as usual) implies that the issue I described can still occur.

I'm afraid I don't understand the issue you're describing. Can you try again, probably with a concrete example?

Another question: is the escaping happening in the transport layer or above. So is 1b 1b 1b 1b 1a 00 xx xx encoded as 0x0c 1b 1b 1b 1b 1b 1b 1b 1b 1a 00 xx xx or as 0x09 1b 1b 1b 1b 1b 1b 1b 1b 1a 00 xx xx?

It's part of the transport layer. The SML data 09 1b 1b 1b 1b 1a 00 xx xx (which is a string consisting of 8 bytes) would be encoded as 09 1b 1b 1b 1b 1b 1b 1b 1b 1a 00 xx xx.

I previously thought about doing parsing and decoding of the transport layer in one step, which I think is what you've done in your parser. It exploded in complexity due to escape sequences in the user data that would need to be handled on the fly. That's why I've separated the transport layer decoder and the parser.

fkohlgrueber commented 1 year ago

You can take a look at the unit tests I've written for the transport layer to get a better picture of the relationship between encoded and decoded data. See for example here: https://github.com/fkohlgrueber/sml-rs/blob/main/src/transport.rs#L759