Endianness and network order integers

benedekkupper commented 2 years ago

I have two ideas/requests from my recent work:

The C++ standard is finally receiving endianness-related support, e.g. byteswap() and endian, which are already kind of implemented in ETL, so it would be nice to get cross-compatibility (ETL relying on STD version if supported, or implementing it with current code for earlier C++ versions).
For my current work this support is not sufficient, what I need is actually support for network order integers of different sizes. The difference to native order integers isn't only the endianness, but also that the former are byte-aligned. So we should implement network order integers, which use byte array as underlying type (to ensure byte packed alignment), and have implicit conversion function to the native integer equivalent. I would be happy to draft up something, but I wanted to get some input first.

jwellbelove commented 2 years ago

Have you looked at etl::byte_stream?

jwellbelove commented 2 years ago

https://www.etlcpp.com/byte_stream.html

jwellbelove commented 2 years ago

I am currently working on ETL versions of C++20's header, for ease of porting of code from the STL. I may add facilities to force the ETL to use STL definitions of endian when available.

benedekkupper commented 2 years ago

byte_stream looks neat, but it's more suited for single layer protocol processing, and doesn't accommodate protocols with multiple layers that might be using the same pieces of the message.

Let me illustrate the idea with some unit test mockup:

using nouint8 = network_order_prototype<uint8_t>; // maybe the endianness could be another template parameter
using nouint16 = network_order_prototype<uint16_t>;
using nouint32 = network_order_prototype<uint32_t>;
using nouint64 = network_order_prototype<uint64_t>;

struct header
{
  nouint8 a1bytefield;
  nouint16 a2bytefield;
  nouint32 a4bytefield;
  nouint64 an8bytefield;
};
static_assert(sizeof(header) == 15, "header isn't packed correctly");  // Requirement 1: byte-alignment

TEST(demo)
{
  std::vector<uint8_t> v(sizeof(header));
  std::iota(std::begin(v), std::end(v), 1); // header is filled with increasing numbers

  auto h = reinterpret_cast<header*>(v.data());
  // Requirement 2: implicit conversion to machine type, with correct byte order
  uint8_t a1bytefield = h->a1bytefield;
  CHECK_EQUAL(0x01, a1bytefield);
  uint16_t a2bytefield = h->a2bytefield;
  CHECK_EQUAL(0x0203, a2bytefield);
  uint32_t a4bytefield = h->a4bytefield;
  CHECK_EQUAL(0x04050607, a4bytefield);
  uint64_t an8bytefield = h->an8bytefield;
  CHECK_EQUAL(0x08090a0b0c0d0e0f, an8bytefield);
}

Basically this is equivalent to how attribute(__packed__) or similar constructs work. The problem with these is that they are compiler dependent and the syntax is quite different for each compiler, and also don't play well together with modern C++ syntax classes.

jwellbelove commented 2 years ago

So it's a type that implicitly converts back and forth from a char aligned underlying buffer, with a specified endianness.

benedekkupper commented 2 years ago

So it's a type that implicitly converts back and forth from a char aligned underlying buffer, with a specified endianness.

Precisely. Constructing, assigning and accessing it in any direct way should work exactly the same as the native type. Unlike native types though, the underlying memory representation is controlled by the code, not by the platform. So basically a platform-independent storage for fundamental types.

jwellbelove commented 2 years ago

I see the platform-independent types just supporting a simple load/store API. Adding functionality, such as the arithmetic operators, would create a very inefficient type, that would not be obvious to the user.

benedekkupper commented 2 years ago

True, I'm in favor of that. The best case scenario would be if we could make atomic versions as well, but I don't think that's quite feasible with the current toolset of the language.

benedekkupper commented 2 years ago

Regarding the naming, I think be or le prefix would be suitable, e.g. beuint32, be_uint32_t or similar, as 'host order' isn't guaranteed to be little endian, and it also makes the practical effect of the type more explicit.

timrid commented 2 years ago

What you says looks quiet similar to the boost buffer types: https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/using_io/buffer_types.html

jwellbelove commented 2 years ago

Actually, I forgot to mention a feature of byte_streams that allows accommodation of custom types. For example, the unit tests demonstrate the packed streaming of a structure's elements, by specialising the reader and writer functions.

//***********************************
struct Object
{
  int16_t i;
  double  d;
  uint8_t c;
};

namespace etl
{
  //***********************************
  template <>
  void write_unchecked<Object>(etl::byte_stream_writer& stream, const Object& object)
  {
    stream.write_unchecked(object.i);
    stream.write_unchecked(object.d);
    stream.write_unchecked(object.c);
  }

  //***********************************
  template <>
  bool write<Object>(etl::byte_stream_writer& stream, const Object& object)
  {
    bool success_i = stream.write(object.i);
    bool success_d = stream.write(object.d);
    bool success_c = stream.write(object.c);

    return success_i && success_d && success_c;
  }

  //***********************************
  template <>
  Object read_unchecked<Object>(etl::byte_stream_reader& stream)
  {
    int16_t i = stream.read_unchecked<int16_t>();
    double  d = stream.read_unchecked<double>();
    uint8_t c = stream.read_unchecked<uint8_t>();

    Object object{ i, d, c };

    return object;
  }

  //***********************************
  template <>
  etl::optional<Object> read<Object>(etl::byte_stream_reader& stream)
  {
    etl::optional<Object> result;

    etl::optional<int16_t> i = stream.read<int16_t>();
    etl::optional<double>  d = stream.read<double>();
    etl::optional<uint8_t> c = stream.read<uint8_t>();

    Object object{ i.value(), d.value(), c.value() };

    result = object;

    return result;
  }
}

benedekkupper commented 2 years ago

The stream type approach still doesn't accommodate protocol formats where different layers need to access different parts of the message. In that case it's quite inconvenient having to bother with jumping the stream's head all around to get different parameters out of the message.

jwellbelove commented 2 years ago

20.23.0

benedekkupper commented 2 years ago

I came across this post: http://stackoverflow.com/a/36937049 and I was wondering if we could make the construction of these types constexpr using the bswap logic from the linked SO answer.

jwellbelove commented 2 years ago

Yes, they could be made constexpr. I would need to add code to select the use of the compiler intrinsics when available, as I did with some of the algorithms.

benedekkupper commented 2 years ago

Shall we reopen this ticket to track that activity, or create a new one?

jwellbelove commented 2 years ago

I think a new one.

ETLCPP / etl

Endianness and network order integers #489