OleksandrKvl / sbepp

C++ implementation of the FIX Simple Binary Encoding
https://oleksandrkvl.github.io/sbepp/
MIT License
39 stars 4 forks source link

Function to calculate the buffer size needed to encode the message. #33

Closed ujos closed 3 months ago

ujos commented 4 months ago

Could you implement the constexpr function to calculate the buffer size needed to encode the message.

At this moment I calculate it manually like following:

namespace types = mdb_protocol::schema::types;
namespace messages = mdb_protocol::schema::messages;

constexpr auto headerSize = sbepp::composite_traits<types::messageHeader>::size_bytes();
constexpr auto blockSize = sbepp::message_traits<messages::Subscription>::block_length();
constexpr auto groupEncodingSize = sbepp::composite_traits<types::groupSizeEncoding>::size_bytes();
constexpr auto exchangesBlockSize = sbepp::group_traits<messages::Subscription::exchanges>::block_length();

const auto totalMessageSize = 
  headerSize + 
  blockSize + 
  groupEncodingSize +
  exchangesBlockSize * req.exchanges_.size();

It works, however I believe the sbepp library would benefit from having this function generated inside the parser header.

While it is simple case with just one group, there are a couple of ideas from my side:

OleksandrKvl commented 4 months ago

For the future, it's better to provide XML for mentioned message because it's not always obvious. In this case, I guess your group is what I call "flat", meaning that it doesn't have any nested groups or data fields.

Here's the message:

<sbe:message name="msg1" id="1">
    <field name="field1" id="1" type="uint32"/>

    <group name="group1" id="30">
        <field name="field1" id="1" type="uint32"/>

        <group name="group2" id="20">
            <field name="field1" id="1" type="uint32"/>

            <group name="group3" id="20">
                <field name="field1" id="1" type="uint32"/>

                <data name="data" id="6" type="varDataEncoding"/>
            </group>

            <data name="data" id="6" type="varDataEncoding"/>
        </group>

        <data name="data" id="6" type="varDataEncoding"/>
    </group>

    <data name="data" id="6" type="varDataEncoding"/>
</sbe:message>

What should be the signature of the function? Also keep in mind that group/data size is not always known at compile-time so this function should also work at run-time.

In my practice it's just easier to use memory pool with slots large enough to hold any of your messages. For example, Euronext documentation specifies maximum size of their messages.

ujos commented 4 months ago

In my case the messages are really "flat" (as of today).

    <type name="Exchange" primitiveType="char" length="10"/>
    <sbe:message id="10" name="AddSubscription">
        <field id="0" name="requestId" type="uint32"/>
        <field id="1" name="instrument" type="Instrument"/>
        <group id="2" name="exchanges">
            <field id="4" name="name" type="Exchange"/>
        </group>
    </sbe:message>

In this case the expected function interface is:

struct AddSubscription {
  static constexpr size_t getMinSize(size_t exchangesGroupSize);
};
OleksandrKvl commented 4 months ago

Ok but I asked about signature not for your message, which is trivial, but for a more complicated one.

ujos commented 4 months ago

Ah... my vision is that function signature for the XML above should be following:

size_t getMinSize(
  size_t group1TotalEntriesCount,
  size_t group2TotalEntriesCount,
  size_t group3TotalEntriesCount,
  size_t group3DataTotalLength,
  size_t group2DataTotalLength,
  size_t group1DataTotalLength,
  size_t dataLength);

Another option is to generate that function for the trivial cases only.

OleksandrKvl commented 4 months ago

I think it can be simplified, instead of separate groupNDataTotalLength we can have a single parameter totalDataLength. This works because from groupNTotalEntriesCount we know how many data fields will be in memory, their total size is data1LengthTypeSize + ... + dataNLengthTypeSize + totalDataLength, what do you think? Although I can't say I love this interface, it's quite easy to say in documentation that group sizes should appear in their XML order from top to bottom of the message.

ujos commented 4 months ago

instead of separate groupNDataTotalLength we can have a single parameter totalDataLength

Sounds reasonable.

Although I can't say I love this interface, it's quite easy to say in documentation that group sizes should appear in their XML order from top to bottom of the message.

I'm not sure I've got it. What do you mean?

BTW, what if instead of parameters we have a struct?

struct SomeGoodNameForTheStruct
{
  size_t group1_total_entries_count;
  size_t group2_total_entries_count;
  size_t group3_total_entries_count;
  size_t var_length_data_total_length;
};

In case if some group is removed or reordered, user gets a compile time error?

OleksandrKvl commented 4 months ago

BTW, what if instead of parameters we have a struct?

There's no good place to define this struct. Its usage will be a pain, e.g. estimate_size_bytes(schema::messages::msgName<byte_type>::some_good_name{1,2,3}). The only protection it can give is when something is removed from schema. When something is added, the best you can get is warning from compiler about uninitialized member (and not in all cases).

In case if some group is removed or reordered

Let's say that SBE doesn't allow to remove anything from schema and the only extension is supports nicely is adding fields at the end of the block. Adding groups and datas works only for the top-level block and only at its end, meaning that if you have any data, you can only add new data, not group. Reordering is not allowed at all.

The only "nice" way to have that protection is to tag each size value like estimate_size_bytes<msg_type>(sbepp::group_size<schema::messages::msg::group1>(1), sbepp::group_size<schema::messages::msg::group1::group2>(2), ...) and it's very verbose. Too much for a functionality that's questionable on its own (because it works only if you pass correct sizes, i.e., relies on human factor).

OleksandrKvl commented 4 months ago

Btw, what's your real-world case for it(the one from this issue seems a bit artificial)? The only one I can imagine is to pass max number of each group entries and data at compile-time and then use buffer of such size to create a message which is either: always the same, not always the same (in this case you need to have additional checks to verify that max values used for size estimation are correct).

When you want to do estimation at run-time, it's pretty questionable that implied overhead really worth it. In all these cases, why can't you just hardcode any large-enough buffer size and then run unit tests to always be sure that it's enough? I mean, this strategy of calculating buffer size each time doesn't look good, especially when you schema contains many messages. It's much more efficient to have a memory pool with slots large enough for all the messages instead of calculating and allocating a buffer of different size each time.

ujos commented 4 months ago

I think for the task I'm working on now I'll allocate 65KB on the stack. This storage must fit all the messages involved in communication.

In case if message is really big (there is one with the var-length array) and for some reason it does not fit the 65K, then as a fallback I will allocate the memory on the heap.

In order to implement this logic I need estimate_size_bytes() function.

When you want to do estimation at run-time, it's pretty questionable that implied overhead really worth it.

Part of the math will be done by compiler at compile time.

OleksandrKvl commented 4 months ago

OK, I'll try to implement it when I have time and let you know

OleksandrKvl commented 3 months ago

@ujos you can check #43 for the current implementation. I'll keep it open for a couple of days waiting for your reply.

ujos commented 3 months ago

I need to finish one thing I'm working on now. It will take a week probably. Then I can upgrade to the latest version of the sbepp and test it. If you don't mind please keep it open for a week. I will do my best to check it ASAP.

OleksandrKvl commented 3 months ago

OK, no problem

ujos commented 3 months ago

I've checked the code in the PR. It works great. I believe the ticket can be closed now. Thank you very much.