gnuradio / pmt

pmt
GNU Lesser General Public License v3.0
11 stars 11 forks source link

Add support for long and long long types #87

Open jsallay opened 1 year ago

jsallay commented 1 year ago

This issue was discovered when compiling a web assembly version of the library. It turns out that long and long long are sometimes separate types from uint32_t and uint64_t (but not always). It depends on the system architecture width and the compiler.

We are going to need to add support for these types conditionally if they are different than the other types. For example, in gcc11 on a 64 bit system, a uint64_t is the same as a long, but long long is a separate type. From looking at headers, it appears that that on a 32 bit system, a long long would be a uint64_t and a long would be a separate type.

jsallay commented 1 year ago

See https://en.cppreference.com/w/cpp/language/types for more info.

jsallay commented 1 year ago

Doing more research and playing around it looks like we can't make all of the types work portably. Specifically we are dealing with long, long long, size_t and uint64_t. When we serialize a pmt, it will be interpreted on the receiving machine. We need to make sure that there is a consistent definition on both the producing and receiving machines.

As shown in the link above - a long has 32 bits on some systems and 64 bits on others. A long long always has 64 bits. A size_t could be a unique type or could map an existing type according to the spec. It appears that in modern compilers it either maps to a long or a long long depending on system bit width and long definition. Similarly, a uint64_t either maps to a long or long long.

It is important to remember that a type alias, such as size_t, does not create a new type and cannot be detected at compile/run-time. If a size_t is implemented as a unsgned long, then we can't distinguish if a variable was declared as a unsigned long or size_t.

Herein lies the problem. We can either have a consistent, portable definition of long and long long or a consistent, portable definition of uint64_t. We can't have both.

If we store long and long long directly in the pmt. Then we can portably share the values between systems. There is a potential loss if we transfer a 64-bit long onto a system that stores it as a 32-bit long. However, there is an issue if a user transmits a uint64_t. In gcc/clang on linux. The uint64_t is an alias for long. If I transmit that to a system like MSVC Windows that uses a 32-bit long (and uint64_t is an alias for long long), then we will get an error that the data is of the wrong type when we try to convert it back to a uint64_t.

// linux
pmt x = uint64_t(10);
val = pmt.serialize(x);

// On Windows system
pmt y = pmt.deserialize(val);
// Fails with type mismatch because linux uint64_t =! windows uint64_t
uint64_t x = std::get<uint64_t>(y);

Alternatively, we can (and presently) do define a uint64_t pmt type. This will portably transmit a 64-bit number between systems. However, like above, if the user transmits an 'unsigned long' (or a size_t), instead of a uint64_t, then we could have a type mismatch (or no match) on the receiving system.

@RalphSteinhagen @mormj do you have any thoughts on the issue? I am leaning towards keeping the support for uint64_t because it has a clear and consistent meaning and I think it is far more confusing for uint64_t to not work, then for variable width types (long/size_t) to not work.

In we keep the uint64_t, then we would want to document not to directly use long/size_t or to use the cast function rather than std::get.

// linux
size_t x = 4;
pmt y(std::in_place_type<uint64_t>, x);  // Alternatively pmt y = uint64_t(x);
val = pmt.serialize(x);

// On Windows system
pmt y = pmt.deserialize(val);
size_t x = pmt.cast<size_t>(y); // Alternatively size_t x = size_t(std::get<uint64_t>(y));

We would want to consider adding some special handling to the struct reflection so that structs could contain long/size_t.

mormj commented 1 year ago

Wow. Very interesting - I had not thought about uint64_t being ambiguous. Your last recommendation makes sense - document additional precautions for using cstdint types across systems. As long as there is a way to do it, it seems that is the most consistent. size_t is inconsistent by design, so that should not be expected to work portably.

Or maybe we should just use flatbuffers to serialize ;)

RalphSteinhagen commented 1 year ago

Or maybe we should just use flatbuffers to serialize ;)

:grimacing:

I think the proposal of primarily supporting the uintXX_t types is IMO sound. The conditions on size_t are too weak to be portable... forgot about that. Thanks @jsallay for digging into that.

Have to think about whether we could intercept size_t via a warning or similar for pmt... :thinking:

rear1019 commented 1 year ago

It turns out that long and long long are sometimes separate types from uint32_t and uint64_t (but not always).

They must be separate types – the former is signed, the latter unsigned.

It appears that in modern compilers it [size_t] either maps to a long or a long long

This is not possible – size_t must be unsigned. Try compiling the following code:

#include <type_traits>
#include <cstdint>
#include <cstddef>

static_assert(std::is_signed_v<uint64_t>);
static_assert(std::is_signed_v<size_t>);
static_assert(std::is_same_v<long, uint64_t>); // long long on Windows
static_assert(std::is_same_v<long, size_t>);  // long long on Windows

int main() {}

// Fails with type mismatch because linux uint64_t =! windows uint64_t

This is not true (and would make fixed width integer types completely pointless). From cppreference:

uint8_t uint16_t uint32_t uint64_t unsigned integer type with width of exactly 8, 16, 32 and 64 bits respectively (provided if and only if the implementation directly supports the type)

EDIT: Added comment in example code.

jsallay commented 1 year ago

I apologize there were a few typos in my previous message. I was not meaning to conflate signed and unsigned types. I think my errors caused you to miss the point. The issue is the same with both signed and unsigned long types.

First we need to understand type aliases. https://en.cppreference.com/w/cpp/language/type_alias

A type alias declaration introduces a name which can be used as a synonym for the type denoted by type-id. It does not introduce a new type

Next, lets look at the data models. https://en.cppreference.com/w/cpp/language/types From the table in the middle of the page, we can see that with a compiler that uses an LLP64 Model that a long has 32 bits, but with a LP64 Model a long has 64 bits.

Types such as uint64_t/int64_t are not explicit types, but are alias types. (It could be implemented as its own type per the c++ spec, but no modern compiler does this.) On an LLP64 system it has to be implemented as a (unsigned) long long. On an LP64 system it could be a (unsigned) long or (unsigned) long long. gcc and clang both implement them as aliases of (unsigned) long.

From https://en.cppreference.com/w/cpp/types/integer - it specifies that a uint64_t (or any other fixed width type) just has to be implemented with that number of bits. It says nothing of the underlying type. For pmts, the underlying type matters. A long is not the same as a long long. If we have int64_t on gcc/linux - it is implemented as a long. That same value on windows would be implemented as a long long. Thus if we serialize a long on linux and try to interpret it as a long long on windows - we will get a type mismatch error.

I don't have time now, but you can try this out in compiler explorer and verify for yourself.