OleksandrKvl / sbepp

C++ implementation of the FIX Simple Binary Encoding
https://oleksandrkvl.github.io/sbepp/
MIT License
42 stars 4 forks source link

Encoding/Decoding a message using cursor-based accessors #25

Closed ujos closed 3 months ago

ujos commented 6 months ago

I'm new to the sbepp library. It is really good. Much better than real-logic.

I'm not sure I follow the way, how to use the cursor. I understand it helps to optimize some sort of things like groups and var-sized arrays. I believe there has to be more convenient way to do it in C++. For example you can think of the builder design pattern. Or kind of builder.

Consider the following code:

struct X{
    struct AfterB{
        X& obj_;
        void setC(int v) { return obj_.setC(v); }
    };

    struct AfterA{
        X& obj_;
        AfterB setB(int v) { return obj_.setB(v); }
    };

    AfterA setA(int) { return {*this}; }
private:
    AfterB setB(int) { return {*this}; }
    void setC(int) {}

};

This structure forces you to read the fields sequentially. There is no way to do it randomly.

int main() {
  auto x = X{};
  x.setA(1).setB(2).setC(3);
}

Maybe I just do not understand the point of how to use the cursor properly.

OleksandrKvl commented 6 months ago

I'm not sure I follow the way, how to use the cursor.

Ask a more specific question so I can help you with a concrete example.

I understand it helps to optimize some sort of things like groups and var-sized arrays.

That's correct but it's really needed only if your message has either deeply nested groups or multiple data/group fields.

For example you can think of the builder design pattern.

That approach has a lot of problems. It basically requires a class per field which by itself is a mess. x.setA(1).setB(2).setC(3); this works for setters but not for getters. My goal was to provide interface similar to struct when you just get/set fields as you want and I wanted to keep normal and cursor-based interface similar to each other.

ujos commented 6 months ago

this works for setters but not for getters. My goal was to provide interface similar to struct when you just get/set fields as you want and I wanted to keep normal and cursor-based interface similar to each other.

The struct approach may work really well in case if there are no variable-length fields. Once either groups or variable-length arrays come into play, the things become really complicated. You must construct message in the proper order. In case if you change the schema and you move fields around, compiler does not help you and you get the runtime errors instead of compile time.

ujos commented 6 months ago

I believe the interface must guide you in such way so there is no way to use it in improper way.

ujos commented 6 months ago

Looks like there is another approach how to handle variable-length fields.

struct X
{
    struct ACookie {};

    struct BCookie {};

    // setters
    ACookie setA(string_view) { return {}; }
    BCookie setB(string_view, ACookie) { return {}; }
    void setC(string_view, BCookie) {}

    // getters
    tuple<string_view, ACookie> getA() { return {"1212", ACookie{}}; }
    tuple<string_view, BCookie> getB(ACookie) { return { "2222", BCookie{}}; }
    string_view getC(BCookie) { return "3333"; }

};

int main()
{
  auto x = X{};
  auto ac = x.setA("1212");
  auto bc = x.setB("2222", ac);
  x.setC("3333", bc);

  auto [a, ac] = x.getA();
  auto [b, bc] = x.getB(ac);
  auto c = x.getC(bc);
}

The fixed length fields can be accessed without the cookies.

ujos commented 6 months ago

Ask a more specific question so I can help you with a concrete example.

It is hard to ask the question. The cursor-based API does not seem really safe. I accidentally called begin() instead of cursor_begin() and got runtime time error.

It is not clear for me when to use cursor functions and when it is not needed. fill_group_header does not require the cursor, however sbepp::detail::cursor_result_type_t has both begin() and cursor_begin().

Also following statement sounds suspicious

// examples.md
// note: order must be preserved for cursor-based accessors
OleksandrKvl commented 6 months ago

Did you read the docs? It contains explanation why sometimes cursor-based accessors are faster.

fill_group_header does not require the cursor, however sbepp::detail::cursor_result_type_t has both begin() and cursor_begin().

Provide the piece of code, otherwise it's hard to understand what you mean. fill_group_header is not an accessor, that's why it doesn't need a cursor. One first accesses/gets the group (with or without a cursor) and then passes it to fill_group_header. Groups have both begin() and cursor_begin() because once you have a group, it's up to you how to use it.

From https://github.com/OleksandrKvl/sbepp/issues/28#issuecomment-2014956168:

(high) API does not protect me at the compile time from setting variable-length fields in the wrong order. This is my the biggest concern. The library must be safe to use. Even though one does not read the documentation. That is why I do not use Real-Logic. Most probably I have to resign from using the variable length arrays.

That's simply not true, nothing is safe if you don't read the docs, this includes higher-level languages or C++'s most used containers like vector. You said The library must be safe to use. but it's up to author of a project to decide what what it must do, what its priorities and goals. For sbepp safety is not the main point, you need to know what you are doing (that's true for C++ in general). I'd even go further and say that SBE itself is not good enough (and I'm sure was not intended) to be a general-purpose protocol. If you need one, try to look at flatbuffers (though I never tested its performance).

ujos commented 5 months ago

That approach has a lot of problems. It basically requires a class per field which by itself is a mess.

Here is another idea how to make the interface more safe when working with varLength data fields:

#include <type_traits>
#include <span>

template<int Counter = 0>
struct Message
{
   auto setField1(int) -> Message { return {}; }
   auto setField2(char) -> Message { return {}; }

   auto setField3(char const*) -> Message<Counter + 1> { static_assert(Counter == 0); return {}; }
   auto setField4(char const*) -> Message<Counter + 1> { static_assert(Counter == 1); return {}; }
   auto setField5(char const*) -> Message<Counter + 1> { static_assert(Counter == 2); return {}; }
   auto finish() -> std::span<std::byte> { static_assert(Counter == 3);  return {}; }
};

int main()
{
   auto msg = Message{};
   auto bufToSend = msg.setField2(1)
      .setField1('1')
      .setField3("")
      .setField4("")
      .setField5("")
      .finish();
}

Again builder-like approach is used. Instead of manually generated classes, templates are used. So compiler at compile time validates that user sets fields in the right order.

There are two concerns:

OleksandrKvl commented 5 months ago

This approach is not much better for many reasons (that I'm lazy to enumerate, sorry). There's a much easier way to add more safety: calculate entity offset like it's done in normal accessors and compare it to the offset obtained using cursor, they should always match. However, this is not on my priority list at the moment so I can't promise you that it will be implemented in the near future.

ujos commented 5 months ago

calculate entity offset like it's done in normal accessors and compare it to the offset obtained using cursor, they should always match

Are you talking about the run time validation?

OleksandrKvl commented 5 months ago

Yes