Open jsallay opened 1 year ago
See https://en.cppreference.com/w/cpp/language/types for more info.
Doing more research and playing around it looks like we can't make all of the types work portably. Specifically we are dealing with long
, long long
, size_t
and uint64_t
. When we serialize a pmt, it will be interpreted on the receiving machine. We need to make sure that there is a consistent definition on both the producing and receiving machines.
As shown in the link above - a long
has 32 bits on some systems and 64 bits on others. A long long
always has 64 bits. A size_t
could be a unique type or could map an existing type according to the spec. It appears that in modern compilers it either maps to a long
or a long long
depending on system bit width and long
definition. Similarly, a uint64_t
either maps to a long
or long long
.
It is important to remember that a type alias, such as size_t
, does not create a new type and cannot be detected at compile/run-time. If a size_t
is implemented as a unsgned long
, then we can't distinguish if a variable was declared as a unsigned long
or size_t
.
Herein lies the problem. We can either have a consistent, portable definition of long
and long long
or a consistent, portable definition of uint64_t
. We can't have both.
If we store long
and long long
directly in the pmt. Then we can portably share the values between systems. There is a potential loss if we transfer a 64-bit long
onto a system that stores it as a 32-bit long
. However, there is an issue if a user transmits a uint64_t
. In gcc/clang on linux. The uint64_t
is an alias for long
. If I transmit that to a system like MSVC Windows that uses a 32-bit long
(and uint64_t
is an alias for long long
), then we will get an error that the data is of the wrong type when we try to convert it back to a uint64_t
.
// linux
pmt x = uint64_t(10);
val = pmt.serialize(x);
// On Windows system
pmt y = pmt.deserialize(val);
// Fails with type mismatch because linux uint64_t =! windows uint64_t
uint64_t x = std::get<uint64_t>(y);
Alternatively, we can (and presently) do define a uint64_t
pmt type. This will portably transmit a 64-bit number between systems. However, like above, if the user transmits an 'unsigned long' (or a size_t
), instead of a uint64_t
, then we could have a type mismatch (or no match) on the receiving system.
@RalphSteinhagen @mormj do you have any thoughts on the issue? I am leaning towards keeping the support for uint64_t
because it has a clear and consistent meaning and I think it is far more confusing for uint64_t
to not work, then for variable width types (long
/size_t
) to not work.
In we keep the uint64_t
, then we would want to document not to directly use long
/size_t
or to use the cast function rather than std::get
.
// linux
size_t x = 4;
pmt y(std::in_place_type<uint64_t>, x); // Alternatively pmt y = uint64_t(x);
val = pmt.serialize(x);
// On Windows system
pmt y = pmt.deserialize(val);
size_t x = pmt.cast<size_t>(y); // Alternatively size_t x = size_t(std::get<uint64_t>(y));
We would want to consider adding some special handling to the struct reflection so that structs could contain long
/size_t
.
Wow. Very interesting - I had not thought about uint64_t being ambiguous. Your last recommendation makes sense - document additional precautions for using cstdint types across systems. As long as there is a way to do it, it seems that is the most consistent. size_t is inconsistent by design, so that should not be expected to work portably.
Or maybe we should just use flatbuffers to serialize ;)
Or maybe we should just use flatbuffers to serialize ;)
:grimacing:
I think the proposal of primarily supporting the uintXX_t types is IMO sound. The conditions on size_t
are too weak to be portable... forgot about that. Thanks @jsallay for digging into that.
Have to think about whether we could intercept size_t
via a warning or similar for pmt... :thinking:
It turns out that
long
andlong long
are sometimes separate types fromuint32_t
anduint64_t
(but not always).
They must be separate types – the former is signed, the latter unsigned.
It appears that in modern compilers it [
size_t
] either maps to along
or along long
This is not possible – size_t
must be unsigned. Try compiling the following code:
#include <type_traits>
#include <cstdint>
#include <cstddef>
static_assert(std::is_signed_v<uint64_t>);
static_assert(std::is_signed_v<size_t>);
static_assert(std::is_same_v<long, uint64_t>); // long long on Windows
static_assert(std::is_same_v<long, size_t>); // long long on Windows
int main() {}
// Fails with type mismatch because linux uint64_t =! windows uint64_t
This is not true (and would make fixed width integer types completely pointless). From cppreference:
uint8_t uint16_t uint32_t uint64_t unsigned integer type with width of exactly 8, 16, 32 and 64 bits respectively (provided if and only if the implementation directly supports the type)
EDIT: Added comment in example code.
I apologize there were a few typos in my previous message. I was not meaning to conflate signed and unsigned types. I think my errors caused you to miss the point. The issue is the same with both signed and unsigned long types.
First we need to understand type aliases. https://en.cppreference.com/w/cpp/language/type_alias
A type alias declaration introduces a name which can be used as a synonym for the type denoted by type-id. It does not introduce a new type
Next, lets look at the data models. https://en.cppreference.com/w/cpp/language/types From the table in the middle of the page, we can see that with a compiler that uses an LLP64 Model that a long
has 32 bits, but with a LP64
Model a long
has 64 bits.
Types such as uint64_t/int64_t
are not explicit types, but are alias types. (It could be implemented as its own type per the c++ spec, but no modern compiler does this.) On an LLP64
system it has to be implemented as a (unsigned) long long
. On an LP64
system it could be a (unsigned) long
or (unsigned) long long
. gcc and clang both implement them as aliases of (unsigned) long
.
From https://en.cppreference.com/w/cpp/types/integer - it specifies that a uint64_t
(or any other fixed width type) just has to be implemented with that number of bits. It says nothing of the underlying type. For pmts, the underlying type matters. A long
is not the same as a long long
. If we have int64_t
on gcc/linux - it is implemented as a long. That same value on windows would be implemented as a long long
. Thus if we serialize a long
on linux and try to interpret it as a long long
on windows - we will get a type mismatch error.
I don't have time now, but you can try this out in compiler explorer and verify for yourself.
This issue was discovered when compiling a web assembly version of the library. It turns out that
long
andlong long
are sometimes separate types fromuint32_t
anduint64_t
(but not always). It depends on the system architecture width and the compiler.We are going to need to add support for these types conditionally if they are different than the other types. For example, in gcc11 on a 64 bit system, a
uint64_t
is the same as along
, butlong long
is a separate type. From looking at headers, it appears that that on a 32 bit system, along long
would be auint64_t
and along
would be a separate type.