arenanet / api-cdi

Collaborative Development Initiative for Public APIs
252 stars 41 forks source link

Make /v2/achievements/groups case-insensitive #657

Open sliekens opened 5 years ago

sliekens commented 5 years ago

If it's not too much of a hassle, please make these behave the same way:

Thanks! :)

Flomix commented 5 years ago

mh I don't kow if I would make such IDs case insensitive by default. Right now the IDs are UUIDs, but I guess one shouldn't imply any semantics in such an ID. When you treat them as they are, as case sensitive strings, nothing would break when the ID semantics change, like to base64 encoded data (which is pretty case sensitive)

sliekens commented 5 years ago

IMO IDs shouldn't have any semantics, they should be opaque values.

Flomix commented 5 years ago

@StevenLiekens Well you just said yourself, that IDs should not be case insensitive ;-)

A4ED8379-5B6B-4ECC-B6E1-70C350C902D2 and a4ed8379-5b6b-4ecc-b6e1-70c350c902d2 are two completely different values, unless you apply specific semantics there (in your case you obviously expect it to be hexadecimal values)

sliekens commented 5 years ago

They're the same textual values. Treating the lowercase version as a completely different value is very impractical for at least two reasons:

  1. some database systems are case insensitive
  2. some website platforms use URL rewriting to convert hyperlinks to all lowercase
Flomix commented 5 years ago

So to you [&AgH2LQEA] and [&AgH2LqeA] are the same items too ๐Ÿ™‚

sliekens commented 5 years ago

No, those are chat links that have documented semantics.

Flomix commented 5 years ago

I don't understand why you don't see your own contradiction. First you request that the API devs apply some quite restricting semantics to the ID in question. And case insensitivity is very restrictive and limiting, much more than things like "please ignore trailing spaces" or so. And in the next post you argue against yourself in that one should NOT expect specific semantics. So you basically represent two completely opposite opinions on this at the same time.

sliekens commented 5 years ago

Case sensitivity is when you assign different meanings (semantics) to the upper and lower-case versions of each letter.

dlamkins commented 4 years ago

I ran into this issue today. Note that .NET and many other platforms generate GUID string output in all lowercase.

Per ITU-T Rec. X.667:

Software generating the hexadecimal representation of a UUID shall not use upper case letters. NOTE โ€“ It is recommended that the hexadecimal representation used in all human-readable formats be restricted to lower-case letters. Software processing this representation is, however, required to accept both upper and lower case letters as specified in 6.5.2.

Also IETF RFC 4122:

The internal representation of a UUID is a specific sequence of bits in memory, as described in Section 4. To accurately represent a UUID as a URN, it is necessary to convert the bit sequence to a string representation.

Each field is treated as an integer and has its value printed as a zero-filled hexadecimal digit string with the most significant digit first. The hexadecimal values "a" through "f" are output as lower case characters and are case insensitive on input.

Additionally, the guilds endpoint accepts UUIDs without case restriction. I don't see any reasonable explanation as to why we wouldn't want to match the spec and improve consistency between endpoints.

Flomix commented 4 years ago

Why do you want to use additional semantics for that ID in the first place? I totally see your point, it is handy to use UUID instead of string internally, for being more space efficient etc. But as you just ran into with the bug in Gw2Sharp it is also risky in a less typized environment like JSON+HTTP where you don't control all factors like for example the used database, so I also see a point in not adding semantics to ID fields, because semantics add (maybe unwanted) restrictions and make the api more fragile.

the /v2/characters/:id is also case sensitive, that would also not be necessary since only one case exists. But nobody complains about that.

With exactly the same reasons you stated above one could also ask for the auth token to be case insensitive. Imagine some tool storing the token as 2 GUID for convenience, just like the guild and achievement group IDs; because hey, it is smart! more than 50% space saved, and it works. ... Needless to say that that app would be completely broken with the addition of subtokens, whereas every app treating the token as semantic-less string would continue to work perfectly.

Another example: the API versioning is an ISO-formatted timestamp (normal ISO 8601 if I'm not mistaken?), but is treated as a simple string. I bet a lot of tools abuse that and just use "?v=latest" in the requests; imagine the API would follow the suggestions of this right issue here and actually treat the version as an ISO timestamp, all those tools would be broken, because "latest" is hardly parseable anymore.

Also, it would not just be to make the param case sensitive, but to change it to the data type UUID. And that would also change the behavior regarding possible values. Right now "https://api.guildwars2.com/v2/achievements/groups?id=FooBar" returns a 404 with the message "ID not found", JSON frameworks that support data types usually throw a 400, because the request is malformed and the argument not parseable.

sliekens commented 4 years ago

@Flomix I still have no clue why you're opposing this but please stop.

These are two representations of the same thing. There are no different semantics. Changing the casing does not change the meaning.

Flomix commented 4 years ago

The simple reason is that my opinion regarding IDs is: IDs shouldn't have any semantics, they should be opaque values. The ramification is, when the server tells me, an Object has ID xYz I use the value as it is, and don't manipulate it in any way, be it changing its case or adding a smiley to it.

@StevenLiekens could you please elaborate my auth token example then. If Authorization: Bearer 25ed36c7-2032-4569-9d05-b59d615e355a25ed36c7-2032-4569-9d05-b59d615e355a and Authorization: Bearer 25ED36C7-2032-4569-9D05-B59D615E355A25ED36C7-2032-4569-9D05-B59D615E355A also have the same meaning then, and why.

And second, if you prefer https://api.guildwars2.com/v2/account?v=current to return a 4xx error instead of returning the newest schema version.

Thank you.

dlamkins commented 4 years ago

I feel like this truly boils down to is the ID intended as a GUID/UUID (per IETF IFC 4122) or not. The IDs are provided using the common and recognizable 8-4-4-4-12 GUID pattern and are valid hexadecimal, so I have no reason to believe that they aren't intended to be GUIDs. As such, I'll be assuming this when responding below:

I totally see your point, it is handy to use UUID instead of string internally, for being more space efficient etc. But as you just ran into with the bug in Gw2Sharp [...]

To clarify I don't believe this qualifies as a bug with Gw2Sharp, but rather with the API (see #657 ๐Ÿ˜›). Gw2Sharp was generating valid GUIDs for the request, but the API was not accepting them (Note: Gw2Sharp has since implemented a workaround that "uppercases" all GUIDs before the request is made).

[...] it is also risky in a less typized environment like JSON+HTTP where you don't control all factors like for example the used database, so I also see a point in not adding semantics to ID fields, because semantics add (maybe unwanted) restrictions and make the api more fragile.

It is necessary to pass a GUID as a string because passing the underlying 128-bit data structure would be unwieldy. Because a GUID is representative of an underlying value, it can't be opaque. To help make this less fragile, we have RFC 4122 which defines everything needed.

Your point about not controlling all factors on the platform is the very reason why we have these kinds of standards. Many languages/platforms/frameworks have built-in datatypes that represent GUIDs. For example, many databases have UniqueIdentifier, Guid is available in .NET (and C, C++, and many more), and UUID in Java (and Python, etc.).

the /v2/characters/:id is also case sensitive, that would also not be necessary since only one case exists. But nobody complains about that.

Please consider we're only asking for this to be on endpoints that utilize GUIDs as the ID. We're not asking that RFC4122 to be applied to non-GUID IDs - based on what else you've said, I've started to get the impression that this is what you think we're asking for and it isn't. Character IDs are the names of the character. There is no RFC indicating how character names should be stored, passed, etc., so of course we'd pass them as we receive them (๐Ÿ˜›). Additionally, the character names act as an opaque ID whereas GUIDs as we see them are representative of an underlying 128-bit value.

With exactly the same reasons you stated above one could also ask for the auth token to be case insensitive. Imagine some tool storing the token as 2 GUID for convenience, just like the guild and achievement group IDs; because hey, it is smart! more than 50% space saved, and it works. ... Needless to say that that app would be completely broken with the addition of subtokens, whereas every app treating the token as semantic-less string would continue to work perfectly.

First, API keys actually appear to be case-insensitive, from a test I just did. That aside, this isn't about saving space. This is about following standards to reduce inconsistencies between environments. If something is a GUID, then I should be able to treat it as such. If I generated a GUID (using the UUID library which is designed to meet RFC 4122) in Python, it also outputs a lowercase string from a GUID provided in uppercase - this isn't unique to Gw2Sharp or .NET. This behavior is standardized by the RFC: Accept either case (because it's hexadecimal and for robustness) and output in lowercase.

Another example: the API versioning is an ISO-formatted timestamp (normal ISO 8601 if I'm not mistaken?), but is treated as a simple string. I bet a lot of tools abuse that and just use "?v=latest" in the requests; imagine the API would follow the suggestions of this right issue here and actually treat the version as an ISO timestamp, all those tools would be broken, because "latest" is hardly parseable anymore.

I believe this example argues the opposite of what I think you're intending to argue. The API never provides us with "latest" as a possible v value in https://api.guildwars2.com/v2.json?v=2019-10-22T15:21:35 . The v attribute still accepts all compliant ISO-8601 date/timestamps (try it: I tested a couple of different ISO-8601 compliant values and it took them without issue such as 16:57:34Z indicating that it is parsing the value on the backend). Any value that fails to parse will return the latest (e.g. v=oldest is no different than passing v=latest). This is not the same issue we're having. Our GUIDs are very much valid per the definition of a GUID, but the API does not accept them.

EDIT: Darthmaim has filled me in on the details here. It's actually just string sorted, which I did not realize. Also means it does not properly handle all 8601 formats correctly since 2019-05-21T23 sorts before 2019-05-21T23:00:00.000Z. It ends up being almost the same exact issue we're having then with GUID where the API uses strings over the datatype and ends up missing alternative cases.

Also, it would not just be to make the param case sensitive, but to change it to the data type UUID. And that would also change the behavior regarding possible values. Right now "https://api.guildwars2.com/v2/achievements/groups?id=FooBar" returns a 404 with the message "ID not found", JSON frameworks that support data types usually throw a 400, because the request is malformed and the argument not parseable.

And I think this clues in on why this endpoint does not behave as it is expected to.

To add to the inconsistency, /v2/guild/:id:, despite accepting mixed-case GUIDs, also replies with a 404 when a malformed GUID is provided. My guess is that the API is getting GUIDs, converting them to a string, and then comparing them to the ID provided in the request after lowercasing (or uppercasing) both.

If that is the case, I'd argue that is the root-cause of this issue. Realistically, the provided ID would be parsed into a Guid type and compared to the values stored already as a Guid so that the comparison is done utilizing the proper types.

Regarding the HTTP response, I'd hope:

The simple reason is that my opinion regarding IDs is: IDs shouldn't have any semantics, they should be opaque values.

Opinions tend to lead to multiple different implementations because by nature, everyone has a different opinion. RFCs culminate the best opinionsโ„ข into a standard so that there aren't deviations like this. If you're worried about things being risky: don't deviate from the standard if one exists. Still assuming that these should be GUIDs, then we know that they aren't opaque and that they are representative of an ID and not the ID itself. Again, the GUID in the format we're working with is just a hex representation of the ID and hexadecimal is traditionally case-insensitive.

The ramification is, when the server tells me, an Object has ID xYz I use the value as it is, and don't manipulate it in any way, be it changing its case or adding a smiley to it.

This is a non-breaking change. We're not asking that the API only accept lowercase GUIDs. We're asking that it accept all properly formatted GUIDs. Such a change would have no consequence on what you're wanting to do here. If you provide the API an ID of any type that was provided previously by the API, it should still work just fine.

could you please elaborate my auth token example then. If Authorization: Bearer 25ed36c7-2032-4569-9d05-b59d615e355a25ed36c7-2032-4569-9d05-b59d615e355a and Authorization: Bearer 25ED36C7-2032-4569-9D05-B59D615E355A25ED36C7-2032-4569-9D05-B59D615E355A also have the same meaning then, and why.

This appears to just be another example of robustness where it isn't an issue to have it. Note that this would not work with subtokens, though, as they are Base64 encoded representations of much more than just an identifier.

That's beyond what we're asking, though. The primary issue isn't consistency within the API, it's consistency with the rest of the world. At a minimum, it should accept a valid GUID, but at this point it currently only accepts a non-recognized subset of the specification.

And second, if you prefer https://api.guildwars2.com/v2/account?v=current to return a 4xx error instead of returning the newest schema version.

I kind of mentioned this earlier, but I believe the way the v attribute works is that it tests to see if the value is a valid ISO-8601 representation of a timestamp. If it is, that is used, if it is not, then the latest schema version is used. See above edit. I was wrong about how this worked.

Many of your counterexamples are just manifestations of the robustness principle in action where input is liberal in what it accepts (where it can be without ambiguity). We're not asking for a liberal parsing of the GUIDs we're passing, we're asking for just proper parsing per RFC 4122 for this endpoint.

Ultimately, we want GUIDs to be accepted as GUIDs. If there are other endpoints that fail to comply with standards or introduce inconsistencies then by all means, I'd hope that they get resolved too. At this time, though, this particular endpoint is inconsistent with other endpoints that accept GUIDs and further doesn't follow the RFC when it comes to accepting GUIDs as a parameter.

Archomeda commented 4 years ago

Fun fact: API tokens are actually case insensitive (except the JWT of course).

Edit: @dlamkins already mentioned it above, I skimmed over it ๐Ÿ˜…

sliekens commented 4 years ago

I think I can best summarize my point like this: changing the casing of text does not change its identity. abc and ABC are both the same sequence, just different representations.