chronoxor / FastBinaryEncoding

Fast Binary Encoding is ultra fast and universal serialization solution for C++, C#, Go, Java, JavaScript, Kotlin, Python, Ruby, Swift
https://chronoxor.github.io/FastBinaryEncoding
MIT License
882 stars 90 forks source link

Question about domain declaration and offset calculation #23

Closed sugurd closed 4 years ago

sugurd commented 4 years ago

First of all, thank you for the wonderful library, it's working fine and does a great job! But I'd like to elaborate few things in order to get better understanding.

First, what is the purpose of 'domain' declaration? Is it for all languages or just for some? It does not appear in the generated .h files.

Second, how the offset is calculated? Is it in bytes or other units? Offset from what? It's not that obvious as may seem at first glance and I did not find any description of that.

Here is the example:

struct MyBytes
{
  bytes data;
}

I added 9 bytes to the data: [1,9];

Here is the corresponding hex dump of the buffer with my guess of what it means:

21 00 00 00 08 00 00 00  // total size 33 bytes + ? offset 8 bytes from the beginning
0C 00 00 00 02 00 00 00  // ? offset 12 bytes from the beginning + ? Type 2
0C 00 00 00 09 00 00 00  // ???? (4 bytes) + data size 9 bytes 
01 02 03 04 05 06 07 08  // the data
09
chronoxor commented 4 years ago

Think domain is a similar to Java package name. For some langs (Java, Kotlin, Swift) domain name will be a prefix in namespace or folder hierarchy of generated files.

Package offset should be used only if you have some dependence of protocols using import. By default each struct Id starts from 0. And if you import one protocol into another you'll get Id conflict of structs. So the solution is to shift all Ids of dependent protocol by some big offset.

sugurd commented 4 years ago

Thanks for the answer, it's clear now about the domain. But I still have a question about the offset. So, as I understand from your answer, there are 2 types of offset: package offset and binary data struct offset. Referring to the example above, could you explain what do those bytes in the buffer mean? I believe that a concrete example like that will make things clearer. It may even be put into the documentation to reduce number of questions.

chronoxor commented 4 years ago

Package offset will be add to all struct type Id. By default struct type Id starts from 0 and increment one by one.

Package offset is 0 by default and could be changed in the package declaration:

// Package declaration
package protoex offset 10

Struct type Id starts from 0 with auto increment by 1 for the next struct.

  1. It could be auto-incremented:

    struct Order
  2. It could be fixed:

    struct Order(1)
  3. Incremented by the given number if you want to reserve some type Ids for future usage:

    struct StructMap(+40)
  4. It could be the same as base struct type Id in case of protocol extending:

    struct Balance(base) : proto.Balance
sugurd commented 4 years ago

Here is an image from the documentation:

According to this picture, there is a field called offset 4 bytes long, little-endian encoded. The purpose of this field is clear, but it's not so clear 1) how this offset is calculated and 2) from what and to what this offset is. If there is a simple answer to these 2 questions, it'd be much appreciated.

chronoxor commented 4 years ago

My answer above was about struct type Id calculation (fixed):

image

Offset from pic in your message is a relative offset in bytes for array body in message stream.

sugurd commented 4 years ago

OK, let's assume I'm a parser. I will try to parse the buffer according to the picture above.

21 00 00 00 08 00 00 00  // total size 33 bytes + ? offset 8 bytes from the beginning
0C 00 00 00 02 00 00 00  // offset from here to the beginning of bytes struct 12 bytes ,  ? Type
0C 00 00 00 09 00 00 00  // ???? (4 bytes) + data size 9 bytes 
01 02 03 04 05 06 07 08  // the data
09

So, first I read uint32_t at buffer[0], that is Root struct FullSize, the value is 33. Correct. Then I read uint32_t at buffer[4], that is Offset to the inner struct, the value is 8. OK. Then I follow the offset and read uint32_t at buffer[8], the value is 12. This is most likely the relative offset from here to the beginning of the inner bytes struct. Next uint32_t at buffer[12] has value 2 and most likely is type of inner struct or some other service info. Next uint32_t at buffer[16] has value 12 and I have no clue about its purpose. Then goes the bytes struct at buffer[20]: uint32_t defines data size, 9 bytes in this case. Then at buffer[24] goes actual data. Is it correct? How would you parse this buffer?

sugurd commented 4 years ago

Example 2: File research_three.fbe :

domain com.fbe.test
package research_three

struct ResearchThree
{
    bytes first;
    uint32 firstint;
    bytes second;
    uint32 secondint;
}

In .cpp file:

    research_three::ResearchThree myModel;
    myModel.first = std::vector<uint8_t>{1, 2, 3, 4, 5, 6, 7, 8};
    myModel.second = std::vector<uint8_t>{0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88};
    myModel.firstint = 0x1A1B1C1D;
    myModel.secondint = 0x2A2B2C2D;

Hex dump of buffers for normal and final versions:

    // Non-final version size: 56
    38 00 00 00 08 00 00 00
    18 00 00 00 01 00 00 00
    18 00 00 00 1D 1C 1B 1A
    24 00 00 00 2D 2C 2B 2A
    08 00 00 00 01 02 03 04
    05 06 07 08 08 00 00 00
    11 22 33 44 55 66 77 88

    // Final version size: 40
    28 00 00 00 01 00 00 00
    08 00 00 00 01 02 03 04
    05 06 07 08 1D 1C 1B 1A
    08 00 00 00 11 22 33 44
    55 66 77 88 2D 2C 2B 2A

SUMMARY

Non-final model format is not that trivial as it may seem from the documentation. As hex dump shows, it does not even preserve the inner structure order:

firstint
secondint
first
second

vs

first
firstint
second
secondint

This may be not important for users who utilize only suggested FBE API, but adds extra complexity level if you have to deal with the binary data directly. Because I have to access binary data from C when developing software for ARM MCU, it would be more appropriate to use the final model and implement versioning by other means.

It'd still be nice to find the description of the offset calculation algorithm in the documentation, with concrete simple examples. I decided to use only final models of FBE in my current project, so this question is not relevant any more and may be closed.

chronoxor commented 4 years ago
struct MyBytes
{
  bytes data;
}

4 bytes - root struct full size 4 bytes - struct begin offset (this is important for storing structs as a fields or array items) 4 bytes - struct inner size (this is important for storing structs as a fields or array items) 4 bytes - struct type 4 bytes - byte buffer begin offset 4 bytes - byte buffer size 9 bytes - byte buffer content [1, 9]

chronoxor commented 4 years ago

I suggest you looking into to proto/fbe.h:

FieldModel<TBuffer, buffer_t>
FieldModel<TBuffer, std::string>
FieldModel<TBuffer, std::optional<T>>
FieldModelArray
FieldModelVector
FieldModelMap

to understand how basic data structures are serialized and stored.

chronoxor commented 4 years ago

Struct serialization code could be investigated in proto/proto_models.h by looking into the

OrderModel::serialize() method
FieldModel<TBuffer, ::proto::Order>::set() method