long datatype is unusable with JavaScript and potentially other languages

auspicacious commented 8 years ago

Hi,

I started reading the 2.0 specification and immediately encountered a serious cross-language compatibility problem.

https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md#data-types

The datatype "long" is defined as a JSON Schema integer with a format keyword of int64.

JSON Schema integers are of course implemented in JSON using the JSON number type. As RFC 7159 points out, although the JSON specification defines numbers as unbounded, the reality is that languages can and do refuse to deserialize numbers outside of certain ranges; most notably in JavaScript, where all JSON numbers are deserialized into the sole JavaScript number type, which is 64-bit floating point and therefore cannot handle integers larger than about 2^53.

https://tools.ietf.org/html/rfc7159#section-6

The Google Discovery Document format, which Swagger drew a lot of inspiration from, recognized this problem and defined an int64 format that is backed by a JSON Schema string, not an integer. It seems that Swagger/OpenAPI changed this. If you were to actually use the datatype as defined in OpenAPI, you would be locking out all JavaScript clients from using your API.

A real-world example of this is Twitter's APIs, which initially provided tweet IDs as a JSON number, and then discovered that JavaScript clients weren't going to be able to handle that, so they had to add a second, string-based field containing the same value.

I have two proposals to address this issue.

The first is simpler, but does not address the larger problem of unbounded numbers in JSON: redefine the "long" datatype as a string.

The second is to drop the int32 and int64 formats entirely. Instead, mandate that all instances of integer and number types must provide explicit values for maximum and minimum to be considered valid. Additionally, do not allow maximum and minimum to exceed (2^31)-1 or be smaller than -(2^31). This would ensure that JSON numbers are used in the most cross-language compatible way possible.

Additional formats or perhaps common schemas could be added to support larger values via the string type.

ePaul commented 8 years ago

I don't think we should restrict API's data types by what is easily possible in some programming languages. You can always map your JSON number into a string in your programming language, if necessary (or a big number type, if you have something like that).

If something is semantically a number (even if a large one), it stays a number, thus it should be a number in JSON (and declared as a number in OpenAPI). Numbers in JSON have no upper limit on size or precision.

On the other hand, I think formats like int32 and int64 don't add very much – they are not actually formats like date or date-time, but are an assertion that the numbers (which are still transmitted as numbers in decimal format) will be in a certain range (and thus can be implemented in binary with 32 or 64 bits).

wparad commented 8 years ago

This seems like a problem with javascript and with attempting to represent the format as a number which can be used. The spec shouldn't respond to languages with fundamental differences, if the difference can already be observed by the spec. This is a language/API problem not a documentation problem. If you release an API expecting all clients can use it, don't use a long, for that matter also don't use foobar because that's a integer type in my special language which only allows values 1 through 5.

webron commented 8 years ago

@auspicacious - you're taking it into a slightly wrong direction, IMHO, and I say this as someone who encountered this issue before with other users.

The key problem here is the API design and not the documentation. Large numeric values would have issues with several languages, not just javascript. In Java, for example, I'd say that type integer that has no format defined would translate to BigInteger and not long. The format can serve two purposes - validation, and data type hinting (especially since the number of available types in JSON Schema is limited).

Now, it is true that javascript cannot (at least by default) process unbounded numbers. However, by using type: string for unbounded numbers - you are saying that the value itself should be transferred as a string and not a number. This is a reasonable solution to the problem (one which we've seen used due to the same limitation) - but this is an API design choice and has nothing to do with OpenAPI as a spec. My recommendation in this case would be to use format: number alongside the string type to indicate that it should be a number (or integer and so on). Most basic validators will not know what to do with it, but those can be extended to 'understand' what it means.

auspicacious commented 8 years ago

I'm afraid that I disagree with that.

I'm in a similar position to you in that I have spent much of the past three years trying to educate people about the difficulties in authoring HTTP/JSON/REST APIs that will consistently work across platforms. Another important focus has been on making it easier to determine backwards-compatibility in APIs; you'll see how those two intertwine in a moment.

I take it as an assumption that since you are developing a specification for HTTP/JSON APIs you are interested in building APIs that can be consumed by the widest variety of programming languages possible; I know that was the driver behind my company's switch from SOAP and binary format. And it goes without saying that you want to help API designers do the right thing.

Further, if you've encountered this problem before, you know that most people don't understand that this problem exists. Even the fact that RFC 7159 explicitly encourages developers never to use values greater than 2^53 isn't widely known. I'd say it's even worse: most developers I know don't even think about cross-platform compatibility; they just never have had to before. I've had people tell me that because jsonschema2pojo generates a 32-bit integer in Java when you pass "type": "integer" therefore the JSON Schema standard is saying that "integer" means 32-bit integer. I've had people tell me that JSON numbers have an implicit bound because "JS" stands for "JavaScript," so all JavaScript rules apply. These are people who are responsible for designing APIs that I then have to tell them to re-write, when I'm able to catch them.

So, if you have a new developer, and they read an OpenAPI specification that encourages them to design an API that will break in one of the most commonly used Web languages, who is responsible? I believe that you are.

You also commented that Java should generate BigInteger when it sees "type": "integer". You're right, it should. That would be defensive. But the only open-source code generator for Java, jsonschema2pojo, doesn't -- it generates int, or maybe long if you read the options! And this, of course, is a casualty of the fact that JSON Schema was so poorly specified, because no-one took responsibility for these issues. A naive person came along, tried to be helpful, and has helped encourage many people to shoot themselves in the foot.

Unless, of course, you are not actually interested in people consuming APIs specified by OpenAPI. I'm sure that people don't enjoy being told that they shouldn't have actually used the tools that are provided to them by the specification; it does not engender confidence, and you need to realize that 99% of your users do not come to OpenAPI with the knowledge they need to make this "API design choice" on their own.

Moreover, it does no harm to use string as a transport rather than integer; both will get the job done. Who would choose less cross-language compatibility for an HTTP API? Again, that's one of the primary reasons people choose HTTP and JSON to begin with.

I said that backwards compatibility plays into this as well, and here's how. I think that your proposal to use format to handle these situations doesn't go far enough.

OpenAPI APIs contain a version number, so I assume that there is some level of concern about backwards-compatibility. For example, let's say that someone defines an API with an ID field. This ID field is initially represented by a 32-bit integer. They never really mention this to their clients, and many of their clients, without better guidance available, create database columns to store that ID that can contain a 32-bit integer.

The API designers realize they're running out of IDs and move to a 64-bit integer, causing massive errors and downtime in their clients, who have to scramble to redefine databases that are now quite large. I could present similar scenarios for string, but I'll stick to integer here.

But this is easy to avoid, and simultaneously solve the primary problem we're discussing. I mentioned this solution in my initial post.

If OpenAPI required items of type integer to have a maximum and a minimum, it becomes far easier to detect changes that are not backwards-compatible.

Moreover, it becomes possible for OpenAPI to specify absolute maximums and minimums, for example, those that correspond to a signed 32-bit integer. It allows OpenAPI to explain to its users why they should do this, in order to protect themselves. It allows tool implementers to make the right choices for their languages. In the absence of positive information about these restrictions, people will make the wrong choices. Boundaries must be explicit, or people will fail to consider them.

It is not possible to make design choices without being informed. As it stands, the OpenAPI specification provides the tools for people to make the wrong design choices, but doesn't even provide a hint that they might be wrong. This is setting people up for failure. That's not responsible.

webron commented 8 years ago

Tackling PR: #741

evanmcclure commented 6 years ago

I noticed you're arguing about sticking to spec. Have you ever heard of "Postel's Law" or the "Robustness Principle" (https://en.wikipedia.org/wiki/Robustness_principle)?

The big idea is this. The code that accepts JSON should be humble in what it accepts to ensure interoperability across all systems. The server can be be as pedantic as it wants to be when it produces JSON.

auspicacious commented 6 years ago

I recommend that you consider the draft RFC entitled The Harmful Consequences of the Robustness Principle.

What would happen if we applied your interpretation of Postel's Law here? If we are to be "pedantic" in what we produce, then our server would have to emit numeric values that conform to the OpenAPI specification -- in other words, numeric values that could not be parsed by a naive JavaScript client. I don't really see how that solves the problem.

handrews commented 7 months ago

In the years since this was last active, we've added a formats registry that defines many formats for more specific, controllable numeric representations. I'm marking this as resolved, although if there's a use case not covered by existing formats, feel free to propose a new one.

OAI / OpenAPI-Specification

long datatype is unusable with JavaScript and potentially other languages #704