OleksandrKvl / sbepp

C++ implementation of the FIX Simple Binary Encoding
https://oleksandrkvl.github.io/sbepp/
MIT License
39 stars 4 forks source link

Strings support #24

Closed ujos closed 2 months ago

ujos commented 4 months ago

Are there any helper functions to read and write string/string_views? I cannot find any unfortunately.

According to SBE specification, the String is:

Array of char of specified length, padded by NUL character if a string is shorter than the length specified for a field.

I can implement it on my own, it is not complicated. I just wonder if you could add it so it is available out of the box. Thanks.

OleksandrKvl commented 4 months ago

No, there are no helpers. This functionality has been on my mind since the beginning but I had no clear vision about it and decided to provide only the basic functionality that allow people to build their own helpers. I will think about possible interface/implementation and post it here.

ujos commented 4 months ago

Just a 5 cents from my side. The strings support can be introduced to arrays based on the char / wchar_t type, as char type is intended for strings, while std::byte is for regular arrays, IMHO.

Some developers though use char assuming byte.

OleksandrKvl commented 3 months ago

Finally, got my hands on this. First of all, I don't think SBE's definition of string is good enough. String can be declared as "an array of char", semanticType=char/String or just "an array of uint8 that documented to hold strings". I believe that string-related functionality should be available for all array types, leaving up to user how to treat them.

Conversion to

As I said in other discussions, I don't like to bind functionality to specific types like std::string or std::string_view, it should be generic. While for <data> T array::to<T>() can be as simple as return To{data(), size()}, for fixed-length arrays it's not so simple because if it's a string, its length might be not equal to size(), something like strnlen_s is needed to find its real length (and it's not guaranteed to be available). At best, static_array_ref has to provide something like to<T>() and to_string<T>(). This doesn't look good to me, especially considering how easy it is for user to implement this functionality on their own.

Assigning from strings

Again, I'd like to have something generic but seems it's not possible because there are different cases:

After a bit of thinking I've come with this set of member functions to cover the above cases:

To conclude, I propose to add the above assign (and maybe insert) versions but no conversion-to functionality because I don't see how to make it generic. Let me know what you think about it.

ujos commented 3 months ago

If you aren't sure how to implement it, you can add sbepp::to_string_view and sbepp::assign(arr, str) functions which will do the job. Even though it is easy to implement those functions on my side, the library should have the those functions that 100% work in accordance to SBE protocol. They are needed just because the string in the SBE specification is not compatible to C-null-terminated string:

Because your library may be used in C++14, you may want to introduce sbepp::string_view, which can be constructed from std::string_view (if available).

Once those functions are ready, you can think of how to make the generated messages more user friendly.

In addition to the array.assign() and array.to_string() functions I'd love to have setters and getters in the message that accept/return string_view:

msg.symbol("IBM");
std::cout << msg.symbol() << "\n";

I don't think you need function that work on char const*. Just string_view is enough.

OleksandrKvl commented 3 months ago

In case if input string is shorter than static array capacity, ALL unused characters must be set with zeros.

This doesn't make any sense to me, another SBE mistake. If buffer contains a short string, there's no reason to set all trailing bytes to zero because when the decoder will treat it as a string, it will only read till the first zero byte, it won't check all the remaining ones.

Having sbepp::string_view won't solve many problems, not mentioning that there's no standard way (or I don't know it) to have implementation that will handle compile-time constants efficiently on 3 major compilers (this requires std::is_constant_evaluated()). That's what const (&)[N] overload can handle. And this custom string_view will never be as convenient as types already used by people.

Here's what I want to implement:

These should cover assignments, as for conversion to string, sorry but it's just too much hassle to replace a one-liner that can be written by user. This library was never intended to support all the possible syntax sugar but only to provide core functionality.

ujos commented 3 months ago

This doesn't make any sense to me, another SBE mistake. If buffer contains a short string, there's no reason to set all trailing bytes to zero because when the decoder will treat it as a string, it will only read till the first zero byte, it won't check all the remaining ones.

That makes sense if you search for the \0 in the reversed order.

Having sbepp::string_view won't solve many problems, not mentioning that there's no standard way (or I don't know it) to have implementation that will handle compile-time constants efficiently on 3 major compilers (this requires std::is_constant_evaluated())

I'm not sure I understand your concern.

That's what const (&)[N] overload can handle

sbepp::string_view can have a constructor for

When the length is short, it will add a single trailing null byte.

Why single zero while SBE requires all trailing bytes must be set to zero? CME, for instance, when I implemented the iLink3 client, they required me to set all trailing bytes with zero.

These should cover assignments, as for conversion to string, sorry but it's just too much hassle to replace a one-liner that can be written by user.

It is easier to write msg.symbol().to_str() than to_str(msg.symbol()), because that is the way how brain works:

  1. You type "msg" and "dot"
  2. Select "symbol()" and type dot
  3. select "to_str()"
OleksandrKvl commented 3 months ago

That makes sense if you search for the \0 in the reversed order.

And who uses this weird approach?

Why single zero while SBE requires all trailing bytes must be set to zero? CME, for instance, when I implemented the iLink3 client, they required me to set all trailing bytes with zero.

Imagine I have memory pool with memory there already zeroed, why should I pay for writing padding zeros when I know they are already zeroed? As a potential approach, assign_string() can return the number of written bytes and user can zero trailing size()-written_bytes by hand. Or, assign_string() can have an extra parameter like with_padding to control this behavior.

I don't want to mess with string_view because as I said, I don't plan to provide to_string() and without it, sbepp::string_view is needed only for assign_string(). There's no reason in re-implementing full string_view functionality just to take use of its constructors. It's not a string_view library after all.

ujos commented 3 months ago

And who uses this weird approach?

In case if message contains long char arrays and you send rather long strings, it is faster to search for \0 in the reversed order.

ujos commented 3 months ago

Or, assign_string() can have an extra parameter like with_padding to control this behavior.

Or have two functions:

ujos commented 3 months ago

I don't want to mess with string_view because as I said, I don't plan to provide to_string()

Just an idea. What if you split the sbepp library into three parts rather than two:

  1. sbepp library that provides basic functionality and traits. That is what the sbepp/sbepp.hpp header file is for.
  2. Generated parser
  3. The glue library that marries the generated parser and sbepp. It implements all the primitives required by the generated parser using the sbepp library. Basically it just does namespace my_protocol::impl { using namespace sbepp; }

In case if user wants to use another implementation of the static vector, they can create new glue library.

I'm not sure it makes much sense though.

OleksandrKvl commented 2 months ago

Merged, you can ask any related questions here.

ujos commented 2 months ago

It seems in the v1.3.0 dynamic_array_ref is missing an overload for the assign_string(R&&)

OleksandrKvl commented 2 months ago

No, it's not, use assign_range, it will do the right thing. assign_string(R&&) was required for static_array_ref because it needs to optionally add terminating zero byte(s) at the end of the string payload. This is not the case for dynamic_array_ref.

ujos commented 2 months ago

I'm not sure user wants to know the details. They just want to assign the string regardless is it dynamic or static array. Even though there is already an overload for char const*.

ujos commented 2 months ago

The only problem with the dynamic size arrays is that I have to initialize them in the order as they are declared in the message I guess....

OleksandrKvl commented 2 months ago

The only problem with the dynamic size arrays is that I have to initialize them in the order as they are declared in the message I guess

Yes.

ujos commented 2 months ago

Anyhow, could you add an assign_string() overload for string/string_view to the dynamic_array?

OleksandrKvl commented 2 months ago

Nope, I won't add a function that adds nothing but a new name for existing functionality. Strings in static and dynamic arrays have different representation. There's no reason both arrays should have common interface just because both are called "arrays". std::vector and std::array have different interface for the same reason.

ujos commented 2 months ago

From the functional point of view assign_string() in dynamic_array really does nothing. But from the usability point of view it adds a static polymorphism, so you can write abstract code that works on both static and dynamic arrays. Also as you know, Strings in SBE are special and assign_string() method in the dynamic_array will make user more confident in what they are doing when it comes to strings.

ujos commented 2 months ago

std::vector and std::array have different interface for the same reason.

Actually those two interfaces are as much close as it possible (begin, rbegin, end, rend, data, size, operator[], size, data, at, empty)

ujos commented 2 months ago

I believe if dynamic_array can hold string, then there must be a method to get and set the string. If static_array can hold the string, then it also must have a method to get and set string. And in order to make the interface unified, the part of the interface responsible for managing the string must be identical in both dynamic and static arrays.