crossbario / autobahn-cpp

WAMP for C++ in Boost/Asio
https://crossbar.io/autobahn
Boost Software License 1.0
251 stars 104 forks source link

De-serialization of numbers #23

Closed taion closed 7 years ago

taion commented 9 years ago

It looks like the current implementation (https://github.com/tavendo/AutobahnCpp/blob/b44ee2e0582923fe8e175eff1ec726582575f4e2/autobahn/autobahn_impl.hpp#L530) does the following:

This makes it really hard to interoperate between JSON and MsgPack clients in certain cases, such as when working with JavaScript and C++. This creates the following problem:

For example, let's say I invoke a procedure registered by a C++ client that I intend to take a float from a JavaScript client. If I call it with 5.0, then I need to any_cast to a uint64_t. If I call it with -5.0, I need to any_cast to an int64_t. If I call it with 5.1, then I need to any_cast to a double. This is really painful.

I understand that WAMP only formally supports non-negative integers, but I feel like this makes coding between AutobahnJS and AutobahnCpp clients unnecessarily difficult. I feel like there should at least be a convenience method that will allow me to generically read a number without having to care about whether it's a positive integer, a negative integer, or a floating-point value.

oberstet commented 9 years ago

Not sure I get the problem: you should be able to any_cast to say double in your app any of uint64_t, int64_t or double. This might involve loss of information. But that is inevitable in certain situations: sending a large uint64_t to JS will loose, since JS only has double for numbers.

taion commented 9 years ago

The type erasure in boost::any doesn't work that way. For example, this code will throw boost::bad_any_cast.

    boost::any a(5);
    double d = boost::any_cast<double>(a);

You can only boost::any_cast to exactly the type that was captured by the boost::any.

oberstet commented 9 years ago

So what are you suggesting? Keep in mind, we need something that works across languages ...

taion commented 9 years ago

I would propose to unpack both positive and negative integers to int64_t to make naïvely using boost::any_cast as easy as possible.

I would also add a convenience function (say autobahn::cast_number<T>) that will do the kind of casting behavior that one would want.

Can submit PR if you think this is reasonable.

ecorm commented 9 years ago

What should happen for large positive integers that don't fit in int64_t, and can only fit in uint64_t?

Another scenario to consider is when using JSON, a peer wants to send 5.0, but it gets serialized to 5.

taion commented 9 years ago

This isn't ideal, but I can't think of a better solution. JavaScript doesn't distinguish between 5 and 5.0, and the msgpack-python packer is pretty opaque (and does distinguish between ints and floats), so the only remaining solution is to handle the polymorphism in the MsgPack message when unpacking.

i.e. if you want a function in C++ that consumes a double, you need to handle the value being either some sort of integer or a proper double.

taion commented 9 years ago

@ecorm:

I feel like there's not really a good way to smooth over differences in numbers across languages anyway. Numbers in JSON have arbitrary precision, but if you try to send something that would be a BigInt from Python (which transparently represents large integers that way) to JavaScript (which represents all numbers as fp64), you're not going to have a good time.

That said, maybe the current unpacking is fine, then. It looks like non-negative integers are privileged in the WAMP spec. I feel like it'd just be a bit annoying when e.g. you don't know whether you're getting a negative integer or not, and end up having to take a branch to figure out whether to any_cast to int64_t or uint64_t. Or also to get a bad_any_cast if someone accidentally passes a negative integer into a method expecting a non-negative one.

ecorm commented 9 years ago

@taion, your proposal for a convenience function seems like a good idea to me. It would let you interpret an arbitrary number whichever way you want. But it raises further questions:

taion commented 9 years ago

I don't think you need to worry about overflow/loss of precision/&c. when casting to double. I'm not sure what you would even do with that information. In this case I'm really just picturing JavaScript-like semantics where you don't really want to care that much about the details of the numeric type.

Not sure about the int64_t v uint64_t case. What does the Java implementation do? I'm not a huge fan of unsigned integer types in general.

ecorm commented 9 years ago

I don't think you need to worry about overflow/loss of precision/&c. when casting to double.

A double can't store exact integers larger than 2^53

I'm not sure what you would even do with that information.

You could throw an exception that prevents you from working with invalid numbers. If someone accidentally sends you a negative integer, and you silently convert it into an unsigned integer, you'll end up with a huge, invalid number.

taion commented 9 years ago

I know that - what I mean is that if you expect to be working with doubles, you probably don't care about that loss of precision for large integers. In the context where you're doing actual floating point math, you only expect a certain amount of precision anyway and explicitly have chosen not to be worried about loss of precision beyond that (otherwise you wouldn't be doing fp math).

One thing that might be worth keeping in mind is that, since Java doesn't have unsigned native integers, it might be best just to keep the C++ and Java implementations consistent. It would be really annoying to have integers larger than the maximum of int64 work in C++ but not in Java.

ecorm commented 9 years ago

I know that - what I mean is that if you expect to be working with doubles, you probably don't care about that loss of precision for large integers.

Ah yes, of course. That makes sense.

The integers IDs used in the WAMP spec (for request IDs, etc), are between 0 and 2^53, so an int64_t will be able to support them.

I would personally not be opposed to eliminating uin64_t in favor of only using int64_t for integers. If someone really needs huge positive integers larger than 2^63, they could cast to/from an int64_t. Users will be dealing with negative integers far more often than they deal with positive integers larger than 2^63.

oberstet commented 9 years ago

Only because JS sucks (everything double) and Java sucks (only signed ints), does not mean C++ should suck.

In fact, WAMP has a "superset of all", not a "least common denomitator" approach. E.g. we support positional and keyword based return values, which are only in languages like PL/SQL and PL/pgSQL.

oberstet commented 9 years ago

What I wanted to say: we definitely will not impose artificial limits for args/kwargs at the WAMP protocol level.

ecorm commented 9 years ago

I don't know if this will help, but I'd like to point out that RapidJSON, a popular C++ JSON parser/generator, has the same behavior as Msgpack when it comes to parsing negative/positive integers.

taion commented 9 years ago

That's fine. I bring up the "not using unsigned integers" part because there are reasonable style guides out there like Google's that discourage the use of unsigned integers: http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types

I do think the current behavior makes it a little tricky to inter-operate across implementations, though. It's a little weird that I can e.g. transparently use a BigInt from AutobahnPython, but it won't work once it hits an AutobahnCpp or AutobahnJS client. Probably not that big a deal in practice.

taion commented 7 years ago

I've long since switched to cppwamp, and given the amount of movement here since, I suspect this is no longer relevant. Closing for staleness.