Closed nickzuber closed 6 years ago
Thanks for presenting both sides of the argument and providing links to documentation. 👍 to versioning request headers.
Sorry for not realizing this until now, but i'm starting to think it doesn't really make sense to introduce versioning into cluster runner just to support pagination. To introduce pagination it seems really easy to just add flags to the /v1/builds
endpoint, something like /v1/builds?start=n&range=m
. My issue with a versioned api is that it's effectively a promise that we're freezing the current behavior of cluster runner and we're gonna support it forever. Now everytime I make a change to cluster runner, I have to add an if statement "If version=1, do the old behavior, else do the new behavior". It sounds like it would be pain to maintain and lead to some complicated code. Correct me if i've misunderstood the point of a versioned api.
As an example, I was just talking to @josephharrington,@nadeem about changing the behavior of cluster runner so that when a slave gracefully disconnects, the master stops reporting that slave in the /slaves
endpoint. If we introduce versioning, we'd have to continue reporting the old dead slave in the v1 /slaves
endpoint. That check is inelegant and complicated. I also don't want to have to keep these old slaves around in memory just to support an old version. If cluster runner was a super mature service that we didn't expect to make many more changes to, then maybe i'd have a different opinion, but we're not even to the 0.9 milestone. I want to be free to move quickly and break things.
Another thing that makes versioning not really make sense imo is that we already do releases of cluster runner every time we make a commit, which is effectively a form of versioning. Other companies are already in complete control over what version they're using, because they can just build the rpm for the version they want. This is a different model most web companies (eg twitter). Twitter has to version its api because when they make a breaking change they will actually break every single consumer of twitter. Unlike twitter, we can tell users of cluster runner "if some breaking change we made messes you up, just don't upgrade to the most recent version".
@ethomas2 There are other changes that justify versioning the API beyond just pagination. One simple one is that I'd like API resource names to be plural (e.g., /builds/<id>
instead of build/<id>
). Another is #282.
Even for pagination, yes it is true that we could add it to the v1 API with query params, but that doesn't solve the problem that having an endpoint (GET /build/
) that can lock up the application for a long amount of time while it returns a huge amount of data is a design flaw. That endpoint should not allow unpaginated requests.
My issue with a versioned api is that it's effectively a promise that we're freezing the current behavior of cluster runner and we're gonna support it forever.
I don't agree with that. Versioning an API is a migration strategy. It allows us to make breaking changes without breaking clients. We can then deprecate older versions of the API.
when a slave gracefully disconnects, the master stops reporting that slave in the /slaves endpoint. If we introduce versioning, we'd have to continue reporting the old dead slave in the v1 /slaves endpoint.
A benefit of API versioning is that we get to define what we consider breaking API changes. In that example I don't consider that a breaking API change, I consider it a bug fix.
Another thing that makes versioning not really make sense imo is that we already do releases of cluster runner every time we make a commit, which is effectively a form of versioning.
The difference between an application version and an API version is important. I should be able to upgrade a service without breaking the automation that I've built on top of it. Users should be able to upgrade ClusterRunner to get things like bug fixes and new features without needing to update all their API client logic (at least until the point where we've deprecated an API).
@nickzuber Evan's comment actually leads into my main point of feedback -- this all looks great. I'd also love to see your ideas for how implementing endpoints in the new API version would look. For example, would we start out by mapping back to the old implementations and then replace those with new implementations as desired?
@ethomas2
To introduce pagination it seems really easy to just add flags to the /v1/builds endpoint, something like /v1/builds?start=n&range=m.
I originally brought up the same point saying that pagination isn't a breaking change if we just add query variables in the URI, however @nadeemahmad told me that we want to have pagination be a default behavior with default start/range values -- and that would be a breaking change.
Now everytime I make a change to cluster runner, I have to add an if statement "If version=1, do the old behavior, else do the new behavior".
I was thinking that the version would only really come into play if there was different behavior between versions specifically. So you'd only have to check for the version if you needed to. But yeah it would probably be more complicated to maintain without a good way of handling the different versions well.
Another thing that makes versioning not really make sense imo is that we already do releases of cluster runner every time we make a commit, which is effectively a form of versioning.
Isn't this more of ClusterRunner versioning rather than just the API? I don't think it'd be a good idea to tightly couple these two together -- if we make a breaking change with the API but also a huge improvement with ClusterRunner itself, consumers should be able to upgrade ClusterRunner without necessarily having to upgrade their usage of the API.
The other api fixes @josephharrington mentioned convinced me that versioning would be nice. I thought we were only doing versioning for pagination, my argument doesn't really work for those 2 changes. I'm also convinced that we should separate the concept of versioning the API and versioning the service. But I'm still not convinced that versioning is necessary for pagination. If the /builds
endpoint is unusable anyway, then it seems like we should just make a breaking change to paginate it by default on the v1 endpoint. Are we planning on making a change soon for which versioning is necessary? If so i'm all for it, otherwise this seems premature.
@josephharrington Something like that -- our goal should be to reuse as much code as possible and avoid repeating ourselves among the different API versions. I think the tricky part is gonna be finding that balance and an organized way to do it, maybe some sort of HoF or something.
Nice writeup!
@ethomas2, For your 3rd point, even if we took the approach of forcing clients to update to match versions (i.e. not being backward compatible), it still makes sense to provide at least some API versioning feature. For example, if a client plans to support v2 API, those changes can be integrated in the client and released before ClusterRunner master is upgraded. The client can then detect the master API version and behave accordingly. In other words, the CR design should not limit the client implementation if it wants to support multiple CR API versions, even if the CR releases will only support a single API version.
@nzuber
I think the tricky part is gonna be finding that balance and an organized way to do it
I think you are correct here. Because of that it might be good to include a bit more details around this in your proposal just to make sure everyone understands all the implications of the proposed design.
One good example to illustrate where it could get tricky is in renaming endpoints (e.g., from /build/<id>/
to /builds/<id>/
). In the versioning approach where the version is embedded in the url (/v1/build/<id>/
vs /v2/builds/<id>/
) then the API routes are easy to change between versions because they're literally different routes. But in the media type header approach, then each handler has to switch its behavior depending on the detected version. A few points that might be tricky:
I'd assume that /v1/build/<id>
would return a 404 if the v2 API is requested, and similarly /builds/<id>
might return a 404 under the v1 API. Like you said, we should share code wherever possible, but because of versioning the handler logic has to have different behaviors based on the requested version. So I'd be interested in what mechanism might handle switching behaviors between the two versions before entering shared logic.
We have some API discoverability implemented via the "child routes" field of responses. If we keep that in v2 API will there be changes necessary to keep it working correctly (i.e., only show child routes within the v1 API when getting responses for v1, and similarly for v2)?
@josephharrington Another tricky thing that I noticed when passing the version through as a header is that hitting an endpoint through the browser doesn't use our network class, rather it just writes from tornado RequestHandler, so it seems like we'd have to add headers in two different places within CR (in the network class and in the response header for our endpoints? I might be misunderstanding how some things work here, feel free to correct me).
For handling the route names, I think we could adjust the RouteNode
class to take a list of regex and version tuples. This way we can build our routes based on the version of the API being consumed. A route could default to "any" version of the API if no specific version is specified in the tuple.
Also, I was thinking again we could alter the RouteNode
class to have the option of marking routes as "optional" (each route won't be optional by default). We could implement this with regex as well.
I'll update my design doc with more specific implementation details
hitting an endpoint through the browser doesn't use our network class
That is true for any incoming API request to ClusterRunner whether it's through a browser request, curl, or from another service. The Network
class is only for requests that the service makes (in other words, outgoing requests).
The only reason we'd need to add headers to the Network
class is if we want communication between the master and slaves to use the v2 api. I believe in your design, if we don't change anything about the Network
class, then communication between master and slave will still use the v1 api because the routes would still include "/v1/".
@josephharrington I updated my doc to clarify some specific implementation details I had in mind
Can we close this?
ClusterRunner API Versioning
Overview
The ClusterRunner API currently doesn't implement any kind of versioning, which limits us from making any kinds of enhancements or improvements to any existing API features in the future. For example, one of the changes we'd like to make is to implement pagination by default. This kind of a change would cause issues for anyone currently consuming the API, which is why we would need to release this kind of update on a new version.
There are two main approaches for handling API versioning, which are basically just two different places we can specify the version:
In general, having the API version in the URI is a simpler approach to implement but using media types is more academically accepted.
Approach: URI
Using the URI to specify the API version is really simple and it's something that ClusterRunner currently simulates:
With the current way we build routes in our API, and considering that we already simulate a version in our URI, making this change would be really straightforward.
Our routes are structured like this:
The regex that corresponds to an endpoint is constructed from the parent routes' regex. Each time we have a capturing groups in the regex, the values captured will be passed in as parameters for the handler's HTTP method. This means if we wanted to add the version to the URI, we could capture the version in the route and handle it that way.
Pros
Cons
Approach: Media Types
Putting the API version in a media type in request headers is generally the more accepted way to handle versioning since a request URI is described as a Uniform Resource Identifier that identifies the resource upon which to apply the request.
Requests/reponses generally are structured something like this:
There are a few different ways to store the version number in the media type:
application/vnd.clusterrunner.v1+json
application/json; version=1
application/json; vnd.clusterrunner+v1
Most of the examples I've seen use the format
application/vnd.clusterrunner.v1+json
, and also it seems like this format is the best option since it best follows the JSON API specification guidelines.The
vnd
prefix is a media type standard for any media types associated with publically available products. For future reference, it seems like these media types might need to be officially registered, but I had a hard time finding specific and recent resources on this topic so it might be irrelevant now; just something to keep in mind.Capturing the version would be pretty straightforward as well. We would first just need to get the header and its value, then parse it for the version. It could look something like:
Pros
Cons
Other Things to Address
Default Version
When a version isn't specified, we need to decide which version of the API to provide. We really have two ways of handling this:
Serving the first version by default wouldn't be ideal since it makes more sense to only provide older versions of the API if specifically requested. Whenever somebody requests a resource, it's fair to expect the most recent version of that resource by default.
The downsides to this is for this initial change, anybody that's currently consuming the API would get updated automatically to the newest version after this is implemented (before they're able to react to the changes). However, this can be avoided if we keep the
v1
in the URI as an exception that routes to the first version of the API if a version isn't already specified in the media types.Handling the Existing URI
All of our ClusterRunner endpoints and routes already have a faux version number in their URI, so we'd need to handle that if we choose to use media types for our API versioning. We have a few different ways for handling this:
v1
from the URIs and use only media typesv1
from the URIs and make a permanent part of the URIv1
v1
from the URIs and make it optional to includev1
is found in the URI, only serve first version of APIv1
v1
in URI, serve first version of APITrying to remove all instances of
v1
from our URIs is effectively a huge breaking change. This means any process already consuming the API will effectively hit 404s and stop working correctly until they themselves manually remove all instances ofv1
from their endpoints. Doing this wouldn't be a great idea since our API would not be backwards compatible with anything that's already using it, and backwards compatibility is something we want to help mitigate the amount of breaking changes.Permanently keeping the
v1
in the URI wouldn't really make sense either since it would just cause confusion since it doesn't describe the resource that's being provided nor does it specify the version of the API. We should deprecate thev1
in the URI and effectively make it optional in our routes. That way, if somebody is already consuming the API and has the oldv1
in their endpoint, they'd still be able to specify the API version with media types and not have to worry about removing thev1
if they don't want to.We should use media types for versioning whenever they're provided. If the version is not provided through media types, then we should serve the most recent version of the API, unless there's
v1
in the URI, then we should serve the first version of the API. This is to help counter as many breaking changes as we can. Anyone currently consuming the API that hasv1
in the URI shouldn't get automatically upgraded to the most recent version of the API unless they want to and are ready to.Some examples of behavior we'd want:
Proposed Design
Even though our ClusterRunner API is currently designed with a faux version in the URI, it'd be worthwhile to switch to versioning in the request headers primarily because it's a more accurate way of describing the resource being provided. Media types are designed for describing data and formats in the response, whereas the URI should represent the actual data. We want to avoid mixing up the resource identification with the resource representation, which would contradict HTTP specifications for what a URI should represent.
We should control our API version through the Accept/Content-Type request/response headers and handle the version after we capture it from the header.
v1
in the URI we default to the most recent version of the API.v1
in the URI we default to the first version of the API.v1
in the URI we use the version specified in the media type of the API.v1
in the URI we use the version specified in the media type of the API.Update
For changing the name of a route path, we can have some dict that assigns a route name with the API version and we select the correct one before we build the route. We could probably come up with a nicer and more uniform way of doing this, but here's a very rough idea of what it could look like:
We shouldn't need to change anything here if we build the routes according to the API version, rather than accepting all possible routes (after we change names etc.) then try to figure out which route is valid per version.
Edit
It might be hard to be able to identify the version before we build the routes, since the application needs to have some routes already before any request can even be sent. So I propose we could adjust the
RouteNode
class to take in a list of dicts which define a path and the versions that are valid. This then can be passed into the handler (which knows the version of the API when it gets a request) and can be handled appropriately. It could possibly look something like:To deprecate the
v1
we can adjust theRouteNode
class to handle "optional" routes. This would just be a flag on aRouteNode
which makes the route's regex path a non capturing optional group. Something like: