OAI / OpenAPI-Specification

The OpenAPI Specification Repository
https://openapis.org
Apache License 2.0
29.09k stars 9.08k forks source link

How to disallow unknown query parameter? #2511

Open smikulcik opened 3 years ago

smikulcik commented 3 years ago

The operation object allows folks to specify parameters, but how can I express that no unnamed parameters may be accepted?

We've had issues where clients are passing in query parameters thinking that they're accepted when actually they are not.

How do folks feel about introducing something like additionalQueryParameters for parameters?

/foo:
  get:
    parameters:
      limit:
        in: query
      offset:
        in: query
    additionalQueryParameters: false
    ...

Then when folks do GET /foo?start=xxx-xxx-xxx they'll get an error instead of a silent failure.

Other links on the web:

savage-alex commented 3 years ago

This is an excellent question! Inside a schema the additionalProperties defaults to true as it (loosely in 3.0) based on JSON schema which will unless specified accept additional properties.

Re query params it is an interesting question and seems to vary per implementation. As these are explicitly defined and documented a client should be following the API definition to ensure the type, range, format etc of their request matches what the OAS definition aka "contract" prescribes. You would be well within your rights to return a 400 detailing the unsupported parameter in the same way you would report an out of range.

Where it gets a bit nuanced is interpretations of postels law on being conservative what you send and liberal in what you accept. Its not unusual to see APIs that would accept but disregard anything that is not in the definition silently and this is also acceptable.

Is it something you could cover in the descriptions and overall behaviour of your APIs stating your position and in your level of error verboseness?

hkosova commented 3 years ago

Possibly related: #568

g-t-a-ogle commented 3 years ago

This would be a very useful feature to have, as an added help to security. It also helps with the conditions mentioned above, especially where you would consider sending the wrong query variables concerning and you have a large number of paths.

MikeRalphson commented 3 years ago

If we moved to representing all in:query parameters as an object described by a single JSON schema, then we would get this functionality 'for free' with the existing additionalProperties keyword.

nfroidure commented 3 years ago

In my APIs I always default to disallow additional query parameters but I think it would be nice to have this option to explicitly say we do it.

I also made a proposal for the order of the query parameters here: https://github.com/OAI/OpenAPI-Specification/issues/883

aagranovExtend commented 2 years ago

This would be very useful. I was looking for this just now.

smikulcik commented 2 years ago

So where does this put us today.

@savage-alex you make a good point about what we actually should and shouldn't document in our specifications.

Perhaps we should ask, "when would it be desirable to permit undocumented query parameters?" If we have no good use cases for it, maybe we should by default specify that implementations should reject unknown query parameters in general.

Virtual-felix commented 2 years ago

Did someone found a workaround in the meantime that the feature might be integrated ? Or there is no way to catch unknown query parameter as an error currently ?

mikekistler commented 7 months ago

Reposting my comments from #3738 here for visibility:

I am wondering how this feature will be used in practice.

I don't think users of the API get any benefit from it. It does not "describe" any useful behavior of the API, unlike additionalProperties which can include a description that explains the meaning of additional properties. It specifies when the operation is required to fail but not when it actually will fail, since undefined query parameters can still cause failures when undefinedQueryParameters is true, the default. So users must be prepared for failures either way.

I suppose this clarifies to implementers how they should treat undefined query parameters, but this assumes that the API description is written by someone other than the team that implements the API, which in my experience is rather rare.

I do understand that, from an API governance perspective, rejecting undefined query parameters is a best practice and failures to do so should be rooted out and corrected. But I don't think adding undefinedQueryParameters helps much on this front -- and could actually hinder it, since it gives implementers an excuse for not following this best practice -- "Well, we didn't document that undefined query parameters must fail, so we don't need to fix our service".

In short, I don't see much practical value in this new keyword and would not recommend it's use in any of the APIs I work with, but if others feel strongly that it is needed I wouldn't oppose it.

handrews commented 7 months ago

I'll note that I've talked to several folks who specifically recommend allowing undefined query parameters due to 3rd-party entities appending them for various purposes (e.g. tracking). There was a brief discussion about it on the APIs You Won't Hate slack recently.

So that's a use case for allowing, and multiple people have spoken up here over the past two years saying that they would use the disallow.

I'm generally very skeptical of one person saying "I wouldn't use this" and making that an argument for not adding something. Particularly when multiple people over time have said they would use it. Why should someone's lack of interest in a feature invalidate the use cases of others? No one is forced ot use the keyword.

karenetheridge commented 7 months ago

If we moved to representing all in:query parameters as an object described by https://github.com/OAI/OpenAPI-Specification/issues/256#issuecomment-780989715, then we would get this functionality 'for free' with the existing additionalProperties keyword.

This is already possible: with style=form, explode=true, type=object, a single parameter will consume all values in the query string.

This functionality should be documented as an example in the specification, as it is quite useful but not immediately obvious to those unfamiliar with json schema (or those who have written an openapi validator ;) ).

I can see that it may still be useful to have an explicit config for what OP requested, however.

handrews commented 7 months ago

@OAI/tsc review request: As I've already submitted a PR for this, could we get a decision on whether to support it or not? If not, I'll drop the PR, if so, we can discuss in the PR whether it is the right way to solve this.

mikekistler commented 7 months ago

I already weighed in above, but I've been mulling this a bit more and it reminds me of the "Open World vs Closed World" discussion from way back.

https://github.com/OAI/OpenAPI-Specification/issues/568

I thought I'd post that here in case it helps inform the discussion.

handrews commented 7 months ago

@mikekistler yes, very relevant, thanks!

For those who don't want to slog through it (although it's interesting), the resolution of that issue is basically "yes, it's a contract, but it's an open-world contract." So if an API contradicts something that the description says (or is documented to mean as a default in the absence of a field), then that's an error on the part of either the description or the API. But if an API does something that is just not addressed by the description, then that's fine.

In this case, additional query parameters are not addressed by the specification, just like (in that linked open world/closed world issue), most API descriptions do not address all possible response codes. We would not want a "disallow unknown response codes" field, because entities outside of "the API"'s control (e.g. intermediaries) can produce unanticipated response codes.

But the syntax and semantics of the query string are the responsibility of the resource identified by the scheme + authority + path (possibly restricted by scheme-specific rules). So whether or not additional query parameters can be accepted is within the scope of API design. It may or may not be a good idea to forbid them (and it poses problems for API evolution), but there may be good reasons to do so.

earth2marsh commented 7 months ago

Like @mikekistler , I also was reminded of the open vs closed world. OpenAPI generally prefers to describe what you can do rather than what you cannot do. If you want to prevent additional parameters, that's a choice you can make in its implementation. Does a consumer need to know that such a thing is expressly disallowed? (They'll find out pretty fast if they just try it.)

So I think the potential benefit here may be for generating backend code or for configuring a gateway to enforce such restrictions? Is adding support in the OAS the best way to accomplish that? [This reminds me of how swagger-node (nee apigee-127) originally had a configuration option for strict validation that would reject requests that contained patterns not described in the description document, IIRC.]

handrews commented 7 months ago

@earth2marsh

OpenAPI generally prefers to describe what you can do rather than what you cannot do. If you want to prevent additional parameters, that's a choice you can make in its implementation.

That's not how schemas work, though. It's weird to have an asymmetry between data modeling in Schema Objects and data modeling in Parameter Objects. The use case is presumably the same as all validation, and might happen on the client or server side.

I really also don't understand the resistance here. What is the downside? I'm not upset about it, just a touch confused and seeing it differently. To me, this is data modeling, and all of the data modeling should work the same. Currently it doesn't, and this straightforward change would bring us closer to parity.

[EDIT: I do get the philosophical aspect, and I completely agree for things like response codes, headers, and (to some extent) HTTP methods. It's just that the query string is as much of a data model as the payload, and I feel like the same philosophy should apply to both the Schema Object and the Parameter Object, while the Responses Object, Path Item Object, and headers share a different philosophy.

handrews commented 7 months ago

Anyway, I'll typically argue a point for as long as it seems like I might win it. If the TSC comes to a collective decision to not do this, please just go ahead and close it, no need to convince me first.

notEthan commented 6 months ago

@earth2marsh

Does a consumer need to know that such a thing is expressly disallowed? (They'll find out pretty fast if they just try it.)

Isn't the point of documenting our APIs so that the consumer doesn't have to blindly try stuff to understand how the API will respond?

It's useful to the end consumer for constructing valid requests. It's useful for automated testing to ensure the API is behaving as expected.

@mikekistler

It does not "describe" any useful behavior of the API ... It specifies when the operation is required to fail but not when it actually will fail, since undefined query parameters can still cause failures when undefinedQueryParameters is true, the default. So users must be prepared for failures either way.

A condition where the operation is expected to fail is behavior that is useful to describe. I don't think that is contradicted by there being other conditions where failure is possible.

@handrews

I'm generally very skeptical of one person saying "I wouldn't use this" and making that an argument for not adding something. Particularly when multiple people over time have said they would use it.

Agreed.

@karenetheridge

This is already possible: with style=form, explode=true, type=object, a single parameter will consume all values in the query string. (and then additionalProperties: true or additionalProperties: false can be used).

This is interesting, and may obviate the philosophical questions above. It's been too long since I read RFC6570 to recall for my own understanding if this encompasses the effect of an allowUndefinedQueryParameters, but it looks like it ought to.

mikekistler commented 6 months ago

I already said above that I won't oppose adding this if others feel there is value in it.

But I wonder ... what does it mean to have additionalQueryParameters: true? What useful information does that give to users? Note that this is very different from additionalProperties: true, which (I think) means that additional properties are strictly allowed and cannot cause a failure.

Also, I don't find this argument convincing:

Isn't the point of documenting our APIs so that the consumer doesn't have to blindly try stuff to understand how the API will respond?

If I documented all the stuff the consumer should need to understand to use the API, do I really need to tell them "Oh and don't try anything else"? Only test programs and hackers try to pass things that are undocumented in hopes of triggering some extra behavior.

But again, if you and others think it adds value in some way, I'm happy to let you do it.

mikekistler commented 6 months ago

I posted above about the Open World vs Closed World discussion in this repo. But I just realized that even more relevant is @handrews blog post "JSON Schema is a constraint system" (yes Henry I read it):

https://modern-json-schema.com/json-schema-is-a-constraint-system

The point of that blog post (I think) is that you must approach JSON Schema not as a "definition/description" system but as a "constraint system" -- starting from "anything is possible", JSON schema adds constraints to what is possible. By contrast, definition/descriptions are additive: "You start from nothing and the more you specify, the more you can do."

I have always thought of OpenAPI as a "definition/description" of an API, though admittedly part of it is specified with JSON schema with is the opposite.

But I think the crux of this discussion is: Should OpenAPI be a "definition/description" or a "constraint system"? If we decide that it should be constraint system, then I think we need to acknowledge that it is woefully inadequate. In addition to constraints on query parameters as this topic is considering, I think we'd also need constraints for:

and there are probably others. Is this really the road we want to start down?

earth2marsh commented 6 months ago

I think you've framed it nicely, and I think this is not the road we want to head down, myself.

nfroidure commented 6 months ago

I think that the point of that additionalQueryParameters: true proposition is to keep both possible and tell users : 'I designed that API in a closed/open worl point of view.

About that point :

Does a consumer need to know that such a thing is expressly disallowed? (They'll find out pretty fast if they just try it.)

APIs ain't only designed for users but also bots and they may not work at all until a programmer actually fix it.

handrews commented 6 months ago

@mikekistler

But I wonder ... what does it mean to have additionalQueryParameters: true? What useful information does that give to users?

It means exactly what we have now, so I don't understand the question.

Per #3529, minor releases MUST be backwards-compatible, and additionalQueryParameters: true is the default value, so a 3.2 Parameter Object with additionalQueryParamters: true is the same as a 3.2 Parameter Object without additionalQueryParamters at all, which by definition MUST have the same meaning as it does in 3.1.

Note that this is very different from additionalProperties: true, which (I think) means that additional properties are strictly allowed and cannot cause a failure.

That's not really correct. additionalProperties just takes a schema and applies it to all properties that are not otherwise accounted-for in the same Schema Object. true and false are just shorthand schemas (for {} and {"not": {}} or any other unsatisfiable schema, respectively.

Within a single Schema Object, there is no difference whatsoever between additionalProperties: true, additionalProperties: {}, and not having additionalProperties at all. It's still possible for those properties to cause a failure by contradicting other constraints (There is a complex interaction with unevaluatedProperties but that's not relevant here because there is no analogue with the Parameter Object).

hudlow commented 6 months ago

@mikekistler

I suppose this clarifies to implementers how they should treat undefined query parameters, but this assumes that the API description is written by someone other than the team that implements the API, which in my experience is rather rare.

I was doing this at IBM for the entire time we worked together... I consider it fundamental to a healthy software engineering organization that this workflow is possible. You also seem to assume that APIs have exactly one implementation, which I find to be an exceedingly strange (and wrong) assumption.

Mike, it seems like you're mostly arguing that we don't need these things because it's incorrect for a service implementation to support things like undefined query parameters, and it's incorrect for client implementations to send extraneous information. But in practice, what you are arguing for is that ambiguity in an API definition is a good thing. I just think that is completely indefensible. Ambiguity is the mortal enemy of quality and robustness.

The reason I think this matters so much is that I think it is exceedingly common for implementations to ignore extraneous/undefined information in a request and exceedingly common for client implementations to depend on this. This is very bad!

But I think the crux of this discussion is: Should OpenAPI be a "definition/description" or a "constraint system"? If we decide that it should be constraint system, then I think we need to acknowledge that it is woefully inadequate.

YES!

handrews commented 6 months ago

@hudlow I've often been the person that writes the description while others (sometimes in a different country/time zone) implement it. I'd say that's a critically important software development process: Using an OpenAPI Description as a contract with a more junior remote team.

mikekistler commented 6 months ago

I'm going to bow out of this discussion now. I concede that there is value in having a "constraint system" for HTTP APIs and I'm fine to let OpenAPI be that as long as it does not impede using it for describing APIs, which is my primary use case.

handrews commented 6 months ago

Just to close the loop for those that were not on this week's Moonwalk call, related to @mikekistler constraint system vs description comment:

I emphatically made the point that this is not an either-or thing. It is a context-specific both-and.

We have JSON Schema as a constraint system, for better or worse, and that's not changing in 3.x. However, it is only used for request/response bodies and query parameters. And this particular issue is only about URL query string parameters, which is an important distinction. In particular, this is not about header parameters.

Headers and response codes (and methods, although we restrict those far beyond what the HTTP spec allows) are defined as open-ended spaces. The RFCs require that implementations MUST allow unknown values, and specifies how to treat them. Constraining those spaces with a "no undefined values" would be objectively incorrect. This is because the API designer does not control what all parties might do with headers and status codes.

Request and response bodies, as well as URLs, are owned by the resource. It is entirely within the right of the resource to put whatever constraints they want on these things. Some users think that's a good thing to do, some users think that's a bad thing to do. We should not care. It has use cases and more demand (shown by the 7 or so people who have asked for it here) than many other features.

The RFCs grant full control of bodies and URLs to resource owners (who are, in this case, the API description authtors), and our spec should reflect that.

handrews commented 6 months ago

At this point, I feel quite strongly that this should be solved, although possibly not the way I did in the PR I've posted.

@earth2marsh you've told me that you're more concerned with Moonwalk, and I agree that we should work out a philosophically consistent system there. But we have to deal with 3.x as it is, and so do users of the spec.

I've gone back through all of the objections and not a single one addresses the use cases brought up by the six different people who have argued in favor (four originally, plus @notEthan and @hudlow as well as one of the first four commenting recently):

Automated testing in particular is a core use case for the OAS. I don't think two people saying "I don't see the use of this" should overrule at least six people who have explained exactly how they would use it. If folks want to explain why those use cases are invalid (which is not the same as "I wouldn't do that"), then let's discuss that. Otherwise, we have a clear demand and clear use cases, so we should support it.


That said, I'm not sure allowUndefinedQueryParameters as proposed in PR #3738 is the way to go. I am now leaning towards just supporting issue #1502, which would give total parity with how application/x-www-form-urlencoded request bodies are handled. That would solve this issue, and several others (e.g. #2347, #2298 and possibly more).

It would also be impossible to complain about added complexity, because tools already have to support the exact same thing for request bodies.

earth2marsh commented 6 months ago

There's a lot I agree with in your post, Henry, but there is still one thing that concerns me. We identified a principle of supporting mechanical upgrades for Moonwalk, so if we continue down this path, mightn't we violate that? (Or else perhaps it means providing some affordance for a constraint system on top of the descriptive/open-world layer?)

handrews commented 6 months ago

@earth2marsh that's a good question, but if we take my solution of implementing #1502, the only thing we are "adding" is an exact analogue of what we already support for request bodies. It's the exact same syntax, and the exact same process, you just put it in the query string instead of in a single-line request body.

So how could that make Moonwalk upgrades any more difficult? That is not a rhetorical question, I just don't see where there would be a problem that doesn't already exist. The upgrade for describing application/x-ww-form-urlencoded should be the same no matter where it goes. In fact, Moonwalk should be an opportunity to make that more consistent, both philosophically and syntactically.

handrews commented 6 months ago

Actually, I'm pretty sure it makes upgrades easier, because you can already do @karenetheridge 's trick in many situations, so we'd have to upgrade that already. But we can migrate that trick to the #1502 solution, which then reduces it to a problem we have to solve anyway (form-urlencoded for request bodies). So it actually provides us with a smoother migration path. That's another reason to use the #1502 solution instead of the allowUnknownQueryStringParams boolean (which I'm increasingly against).

handrews commented 6 months ago

In today's TDC call, we decided to consider a proposal for #1502, which would solve this without adding new fields. @earth2marsh keep an eye out for the proposal and we can continue the discussion there - I have already closed the PR I opened specifically for this issue.