esi / esi-issues

Issue tracking and feature requests for ESI
https://esi.evetech.net/
208 stars 23 forks source link

POST on /fleets/{fleet_id}/members/ responds with 420 #314

Open SpeedProg opened 7 years ago

SpeedProg commented 7 years ago

POST on /fleets/{fleet_id}/members/ responds with 420 (undocumented) and no message when the character you are trying to invite is offline

I like having a different response for offline people but could this be documented or it should probably just be a different error message to the 422?

SpeedProg commented 7 years ago

apparently this also happens if a squad you are inviting to as squad_member is full

aquarhead commented 7 years ago

@a-tal 420 indicates a EVE monolith error.... should we just add this error code to every endpoint?

SpeedProg commented 7 years ago

I'd prefer to get a useful error message instead, like with old CREST fleet endpoint I could determine between an invite failing because squad is full (or is filled with invites, which you can't by just reading all members) or it failing because a player is offline. :/

DaneelTrevize commented 7 years ago

HTTP code 4xx is for Client errors, 5xx is for server errors. If the ESI front proxy/monolith gets an error from an internal endpoint implementation (through no fault of the true external ESI client caller), it should understand that while it is a client of those, it is a server to the outside world, and rather than proxy the 4xx it should probably translate it to a 5xx.

aquarhead commented 7 years ago

Well it's not always the case that something is wrong on the monolith side, it can also be you're giving invalid params in the request and we haven't dealt with that specific error case (yet). I remember we choose 420 because somewhere (unofficially, ofc) states this code basically says "Something is wrong but I'm not sure what exactly happened"

@gitAskur and/or @HakShak might remember where that come from?

DaneelTrevize commented 7 years ago

From https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#Unofficial_codes

420 Method Failure (Spring Framework) A deprecated response used by the Spring Framework when a method has failed.[68] 420 Enhance Your Calm (Twitter) Returned by version 1 of the Twitter Search and Trends API when the client is being rate limited; versions 1.1 and later use the 429 Too Many Requests response code instead.[69]

500 Still seems more correct as an implementation abstraction if you aren't certain it's a client error, else 400.

gitAskur commented 7 years ago

These 420 errors are actually just the monolith action wrappers picking up a error being scattered back to the client making the call. We call them UserErrors and they are not errors or exceptions, just messaging about bad usage back to the client. But there´s no client making the call so without the wrapper picking it up things would just get lost, no client session bound to the call which is the nature of these external actions.

As such I don't really care if they get classified as 500s, 400s or a teapot. All I need to ensure is that ESI gets notified about having supplied parameters that would have caused an error message showing up in the client making the same request. I don't think any error code is going to fit what this actually is, but to be honest I can't claim full knowledge of all the error codes.

So.. whaaatever you want it to be guys. It's a matter of changing a constant in a base class.

gitAskur commented 7 years ago

But we totally know what went wrong, it´s a part of the message and ESI does get that message sent as a response to the action, marked as a 420 "error" so that it can process it differently from actual errors.

DaneelTrevize commented 7 years ago

it's not always the case that something is wrong on the monolith side, it can also be you're giving invalid params in the request and we haven't dealt with that specific error case (yet)

Sounds like this generic error should be a 500 or 502 then (or some unofficial, unreserved 5xx value), and when a more specific error is implemented for that endpoint, that captures that it's client-caused rather than server-side, and can be a 4xx that's passed down the line?

Also teapot would fall under 4xx, see 418 I'm a teapot (RFC 2324) ;-)

SpeedProg commented 7 years ago

Is there any chance the ESI code can actually see from the response it gets back from monolith what the reason it failed for is? Like I remember that CREST gave me back different error texts depending on if the invited player was offline or the squad was full. It would be really awesome to have that again, because the FC that executes the invite would maybe want to remove some one that is offline from the waitlist, while when the squad is full (which can happen w/o having actually 10 players in the squad by having slots filled with outstanding invites, so I can't check that with a members call) the waitlist actually wants to try inviting into an other squad instead. And yes I can't just send the invite for any squad because we want specific roles/ships/fits in specific squads.

gitAskur commented 7 years ago

ESI can actually see from the response it gets back from the monolith what the reason for the failure is.

I never really intended the 420 status codes to leave ESI to a client. I chose a non-code because I wanted the proper codes free for handling within monolith actions, i.e. if you supply an invalid acting characterID the action might have to handle that as it cannot even resolve that to a character node so the action itself has to raise the error, and at that point it makes sense to raise a 404 Not Found back to ESI.

But in the case where you are supplying a character ID as an "acted upon" ID, like you're sending a message or something to him, but that character has done something like block you, which we won't check for until very late in the action chain quite deep into the monolith, then the monolith will signal back to the calling client that the ID provided is wrong.

If you were doing this from a connected client, you'd get a little black notify box telling you something like: "Cannot send message! Scotty the Docking Manager has blacklisted you!" and since ESI is not a connected client this message will just be dropped if we do not handle them in some way.

But obviously you, the client, needs to get an error of some sort stating: Generic Error Received:

And in many cases we will have to actually handle some of these user errors before stepping too deep into the monolith because of incidental practicalities. But the UserError wrapper ensures that no messages get dropped inside of the monolith, as they did before I wrapped them into this 420 nonsense, just because the monolith is coded to only expect only eve client<=>server communications.

Now, what this specifics of should be, I don't know. I'll use the opportunity of next week being Fan Fest week and all of Team Tech Co being at the same place to discuss this with @aquarhead, @tsuthers-ccpgames and @a-tal. If you guys show up at Fan Fest then it would be pretty useful to get the end user perspective on this as well as I'll probably always be stuck on the backend monolith-specific perspective where sending out events coded as 420 makes perfect sense to quickly solve a problem, and make sure ESI doesn't have to differentiate between a Status Code 500 message being an actual 500 error, or a wrapped UserError response.

But to answer the question, yes you can totally get a better message. It might end up with you having to throw another request back to us to properly parse it from a message label + args to an actual message; but that]s all just implementation talk.

HAPPY FRIDAY!

a-tal commented 7 years ago

@aquarhead I'd rather not throw a 420 error in the spec of everything, but endpoints that can result in a UserError should declare that (many endpoints have no path to a UserError) -- and will convert to 500 errors if they don't.

SpeedProg commented 7 years ago

As a little update from the user point of view. After i figured out that the 420 responses actually contained further information (in some as i assume python related format?), and that this information is actually pretty detailed in what is going wrong. I am actually pretty happy with getting those.

But it would be nice to have it documented some where, in some way that you could get those error back and that the body of the request contains information related to the error.

The swagger client I was using just threw the body away since it couldn't find the error in the scheme and thereby didn't know how to parse it :). So for quite a bit I actually didn't know there is anything else being send till I started doing some requests by hand and saw that the body contained some info.

Would be nice to know what format the information send there by monolith are actually in, since they don't seem to be json at least :)

gitAskur commented 7 years ago

We could also just give you back the actual message.

Like, rather than receiving "MktRegionNotOpen", {"regionID:" 10000002} (or whatever format you get from ESI) you could just get the message "The Forge Market is not open yet!". Just a matter of us parsing it on our end before sending and now that I say this I think me and @a-tal talked about doing just that.

We can also expose a point to parse our localized strings to make it optional.

OH THE POSSIBILITIES!

DaneelTrevize commented 7 years ago

Rather than... data just get... text

Sounds terrible. This is a JSON API, not to be spitting out directly human-readable-only strings/HTML. Stick to giving the IDs and hrefs that the clients can then choose to use as they see fit. They can easily build a message about the region if given the data, they can't easily reverse a localized string, and having to depend upon another endpoint to reverse those into data sounds terrible for anything trying to work beyond human processing rates.

As for market specific endpoints & any "market not yet open" state error, either that needs to be queryable by clients and then a 4xx could be considered valid for them not having checked the other endpoint first, or that state isn't exposed and so that condition should result in a 5xx response so the client's aware they did nothing wrong/could do no better.

SpeedProg commented 7 years ago

Sounds terrible. This is a JSON API, not to be spitting out directly human-readable-only strings/HTML.

Pretty sure they mean, that they gonna return a body like {"error": "What ever the error string for the eve client is."}

They can easily build a message about the region if given the data, they can't easily reverse a localized string, and having to depend upon another endpoint to reverse those into data sounds terrible for anything trying to work beyond human processing rates.

Shouldn't you know what you requested in the first place?

But yeah, additional endpoint that you can pass the stuff to, to parse it to a text is a nice option too.

DaneelTrevize commented 7 years ago

Read Askur's full message again, it's unambiguous that instead of data it would return pretty text.
I'd be fine with returning both, or an endpoint/SDE to turn standard errors into standard eve client/ccp messages.

Yes you know what you requested, this whole issue is about knowing what about that is ESI unexpectedly having an issue with, in a technically correct & scalable manner.

gitAskur commented 7 years ago

@DaneelTrevize I meant "just" as in "simply", not as in "exactly/exclusively". But it would be a bit silly for all of us if I were to just change the basic error response format in a backwards incompatible fashion. That being said I'm much more leaning towards exposing an endpoint for people to translate the parts to a message if they want. Doing it on every error sent is wasteful unless someone is reading over all of them and nobody is.

@a-tal, @aquarhead - It can be somewhat difficult to assert that a call will not ever result in a user error. For example, any call that ever resolves a character_id to a character data struct can potentially end up as a CharNotHere error especially with the user supplied nature of ESI. Knowing whether or not a character_id is ever resolved to that structure, or just used as is, requires tracing every possible codepath and if you try to trace every single codepath in the monolith - you're going to have a bad time.

Not to mention all those other pesky developsies in this office and their penchant for adding code which often includes new UserErrors thrown in common program flow places. Pesky other developsies!

At any rate - did we want to add anything to the error response? Like I said above, I'm inclined to have users request the message with the params given to them if they are interested.

DaneelTrevize commented 6 years ago

Gunna tack this recent event on here, as there seems to be a potential conflict for the 420 error code:

Early this morning we logged 2 errors when calling GET /v1/fleets/{fleet_id}/ At 2018-04-09 00:40:21
[body] => {"error":"{'error_label': 'UnMachoDestination', 'error_dict': {'what': 'The specified proxy or server node (1423937) could not be reached'}}"} [response_code] => 420 At 2018-04-09 05:25:45
[body] => {"error":"{'error_label': 'UnMachoDestination', 'error_dict': {'what': 'The specified proxy or server node (1423886) could not be reached'}}"} [response_code] => 420
Headers logged but ommited for brevity.

However, when one POSTs to /v1/fleets/{fleet_id}/members/ involving a character_id for an offline character, you get an undocumented endpoint-specific 420 response:
"{'error_label': 'FleetCandidateOffline', 'error_dict': {'invitee': (2, 1637604641)}}"

So wouldn't it be possible that the 420 could come from .../members/ due to 2 wildly different issues?
And shouldn't these system-wide 420s be reclassed as 5xxs, as they aren't the client's fault but internal to ESI?
If not, can the FleetCandidateOffline response at least get a distinguishing response code?

DaneelTrevize commented 6 years ago

Same again early this morning, at 2018-04-11 00:18:47 [body] => {"error":"{'error_label': 'UnMachoDestination', 'error_dict': {'what': 'The specified proxy or server node (1423614) could not be reached'}}"} [response_code] => 420 No mention of A/B testing in header, just Via: 1.1 google which seems somewhat common

Edit: additionally At 2018-04-12 01:22:35, server node (1424552). At 2018-04-13 02:44:06, server node (1424826). All for the same characterID as fleet boss, if it helps. 1215247180 aka "RT Cancer".

DaneelTrevize commented 6 years ago

Following the ESI 0.8 changes, as per #805, will this endpoint also be getting a 520 documented response? ATM it seems to lack 420 or 520.

DaneelTrevize commented 6 years ago

Ok, now the endpoint-specific [body] => {"error":"{'error_label': 'FleetCandidateOffline', 'error_dict': {'invitee': (2, 93652455)}}"} [response_code] => 520 seems to have changed to again clash with the new 520 generic monolith error code.
Still undocumented, even though it's endpoint-specific ("FleetCandidateOffline").

SpeedProg commented 6 years ago

That is because it is a monolith error 🤕

jowrjowr commented 6 years ago

You got PRECISELY what you asked for and you are still fucking mad about it. Stop shitting up github.

DaneelTrevize commented 6 years ago

@SpeedProg , you yourself raised this issue because the endpoint-specific error cannot be disambiguated from the generic monolith error by status code. And is undocumented in both cases.

@jowrjowr no one's mad. Except maybe you doing precisely what you claim others to be doing.

SpeedProg commented 6 years ago

@DaneelTrevize Good that you know better then me why I raised this issue. But if you read my initial request I actually didn't know at that point that those are monolith errors. And I thought it was a response produced that was just forgotten to be documented which by ccps response it was not. Also I actually wanted the response to be merged with 422 | Errors in invitation

response.

But since then it was clarified that those 420 (now 520) responses are special responses that have nothing to do with the 422.

And actually having the 520 on monolith responses is exactly what I want.

And yes it can be disambiguated, by reading its body? (Well kind of not since it is a "generic" monolith error) But you can figure out which one of the fleet responses it is, which I am actually doing in my code.