Closed nkaretnikov closed 9 months ago
With this policy, the goal should be to have an answer to this immediately: "Is this a public API or not?"
xref: #399
This has been discussed with the team. @nkaretnikov will add notes from the discussion here.
Proposed conda-store backward- and forward-compatibility policy
Incompatible changes should be avoided if possible. Note that this only applies to client-facing code, such as HTTP routes or public Python APIs.
For example, changes to a database table can be done without versioning if data can be migrated in a backward- and forward-compatible way. Everything in the tests
directory can also be changed as we want.
When it's not possible to make a compatible change, client disruption needs to be minimized via versioning, see below.
The main goal: clients should be able to upgrade without any changes. When downgrading, they'll only lose access to certain functionality or APIs, but it should be a pretty simple update on the client side, not a major breakage. We're mainly concerned about clients being able to use new features, so the update process needs to be transparent. In practice, the downgrade process should only matter when attempting to update, but then rolling back to an existing version.
Database changes need to be forward- and backward-compatible to allow users to test new versions and rollback if needed.
If it's not possible to make a change to a table in a compatible way, a new table should be introduced, while keeping the old one around.
Route handlers need to be versioned.
As of writing this, we already have the v1
API.
New incompatible routes or incompatible changes to existing routes should be part of the new API.
Do not add more things to v1
if they are incompatible.
New features that are compatible can be added, though, like completely new functionality exposed through a new route.
Adding a new optional parameter doesn't require adding a new route.
Yes, it will be incompatible if a client uses this optional feature and then downgrades, but it's new functionality, so we are okay with this.
If a new route is added in v2
and it's compatible with v1
, it should be marked as supporting both: v1
and v2
, to allow clients using the v1
API to use it.
Incompatible changes need to be declared as part of the experimental
namespace.
Note: code can also skip the experimental
step and be added directly to the new versioned stable API, but in that case we'll be committing to supporting it, so it should be avoided if possible.
Experimental routes have no guarantees attached to them, we can remove them at any moment. This allows us to add features without committing to support, and test it with clients in real-world scenarios. There're also no guarantees whether all experimental routes are compatible with each other. Some of them might not be compatible, like two different implementations of the same idea.
Once we determine the routes are stable and want to support them, they need to move to the versioned API, like v2
or later.
Note: once a new API version is introduced, the rest of routes, which are compatible, need to add a new declaration, saying they are compatible with this new API.
The goal is to make it clear which routes work together. So all v1
routes are compatible, all v2
routes are compatible, but some v1
and v2
routes are not compatible.
For experimental
, only new routes needed to be marked as such, other routes don't need to be touched, because we make no guarantees about compatibility.
The above also applies to removing features, like parameters, this needs to be part of a new versioned API, as explained above.
/v1/example/route
/v2/example/route
Explanation: this route is compatible with all v1
and v2
routes.
We will not support routes with /latest
or an empty prefix because I consider it error-prone.
Clients are expected to depend on a particular stable version, which usually should be the latest in the release they use (so, v1
, v2
, v3
, etc.).
v2
/v2/example/route
Explanation: this is a new route added in v2
. It's compatible with all v2
routes, but is not compatible with v1
routes.
experimental
route/experimental/example/route
Explanation: this is an experimental route, it might or might not be compatible with stable versioned routes. We make no guarantees about it. It can be changed or removed at any moment without prior notice. It might also not be compatible with some other experimental routes.
When there's a newer API intended to replace a legacy API, the said legacy route needs to return a warning as part of the returned JSON, like:
"warning": "This API is deprecated [reason], please use [replacement in] v[N] API"
HTTP has no standard way to inform users about API deprecation.
It's up to the clients to look for this value in the returned JSON.
The actual value we use for this needs to be communicated in the website docs, as well as in release notes (mentioning which APIs were marked as deprecated and which are new replacements).
The current policy is to never remove deprecated APIs.
Later, we can switch to generation-based removal policy (only keeping the last N API versions) or time-based removal policy (remove deprecated routes after N releases or N months).
If we decide to remove HTTP routes later, the entire API generation/version is getting removed, not just a deprecated route.
So if we deprecate a route in v1
and it's time to remove it, we remove all v1
routes.
Python classes, functions, methods, and so on are considered public APIs. The only exception is names starting with a single underscore.
Private examples:
_private_func_or_method
_PrivateClass
_PRIVATE_VAR
.Public examples:
public_func_or_method
PublicClass
PUBLIC_VAR
.The highest-level entity determines the visibility level.
For example:
class _Private:
# everything is private here even without underscores
def this_is_also_private(self): pass
or
def _foo():
def inner():
# inner is also private - no way to call it without calling _foo, which is private.
This makes it clear which APIs are public and which are private.
Unless marked explicitly, everything is public, which is less error-prone as the other way around.
Changes should be done in a compatible way if possible. For example, by keeping them internal to a function or method. Like, if a new argument is added, it needs to be optional.
Clients should be able to upgrade without any additional changes.
Clients might lose access to some features added in a particular version if they downgrade and rely on those APIs, but it shouldn't cause a major disruption or require a huge rewrite on the client side.
We are not planning to backport features to older releases, though.
When an incompatible change to the public Python API needs to be made, a new function, method, variable, etc. is introduced. Unlike with the HTTP routes, the name doesn't need to be numbered. Instead, the naming scheme should reflect the object semantics and intended use. Also, do not rename the legacy name (it's a breaking change on the client side) or use suffixes such as "new" (these are not meaningful after a few updates).
This applies not just to function signatures, but also to the the way things are called, for example, async vs. non-async code. Because this requires changes on the client side.
So, if a public Python function needs to be ported from sync to async, a new API should be introduced, the old API should stay in place and be marked as deprecated.
For example, if we have a function def list_envs
, which is sync, and we want to add an async variant, we add async def list_envs_async
. Later, if we chose to remove legacy code, to keep the code readable, it's possible to introduce a new async API without the async suffix, by going through the deprecation process again: first by adding async functions without the async suffix and marking the functions with the async suffix as deprecated. Then eventually removing the deprecated functions, depending on the removal policy.
Changes to returned values or types are also considered incompatible unless client code can migrate without changes and breakage.
The deprecated version needs to be marked as such by throwing a warning, for example:
"This function is deprecated [reason/details], use [replacement] instead"
This also needs to be communicated on the website and as part of release notes.
We maintain this policy because we don't know which features clients may or may not use. For example, a client could be defining a custom Authentication class and might rely on any Python code that is not prefixed with a single underscore.
The only exception to this is code in tests
. Tests can be changed however we want because they don't affect clients.
As with the HTTP APIs, there are currently no plans to remove deprecated Python APIs. We might revise this policy later, which will be communicated though the release notes and via the website.
A few comments/opinions.
Database changes need to be forward- and backward-compatible to allow users to test new versions and rollback if needed. If it's not possible to make a change to a table in a compatible way, a new table should be introduced, while keeping the old one around.
I would say I agree but also if it is possible to reconstruct the tables from scratch e.g. the packages/channels we should also also delete the data/reconstruct (guess this would still fit under backwards/forward compatible changes). I think the database out of all the components should not tolerate breaking changes.
I am fine and agree with the need for experimental to promote development without locking us into a particular api too early. At the same time I want to make sure that this approach/work is not too burdensome on the developer
Yeah, I have some thoughts as well after trying to implement role mappings using this new formula. I’ll write down my comments soon, but the main idea: I don’t like sticking to things for too long because it creates problems on the dev side. What we need is to just deprecate things for at least 1 release (so 2 weeks) and communicate things in release notes. Because we don’t use semver, we cannot rely on version numbers. So just need to communicate better, that’s the main idea.
@dcmcand needs to comment on this.
My 2c:
_internal
namespace might help signal this. Hopefully clients are only using the HTTP API.@nkaretnikov Thanks for putting this together.
Here are my proposed changes :
Noteworthy changes,
api/user/v1
for a v1 route and api/user/experimental
for an experimental route. I would welcome discussion on this, but I think it simplifies versioning and lets individual endpoints evolve independently.I very much welcome discussion about all of the above. I also think there is some discussion to be had about if we want to commit to supporting backwards compatibility for Python API's as well as REST API endpoints. It seems to me that conda-store is an application that is intended to be used through the public REST API and it would be reasonable to only apply the backwards compatibility policy to that. Is there a use case where we expect users to interact with conda-store as a library? If so, then it definitely belongs in here. If not, I personally think we can remove that from this policy.
In software development, it is essential to strike a balance between progress and maintaining a stable environment for existing users. This policy will provide guidance on how to the Conda-Store project will handle changes to software and services, ensuring that they do not disrupt or invalidate the experiences of current users, while still enabling innovation and forward progress.
Breaking changes in code refer to modifications or updates made to software that have the potential to disrupt the functionality of existing applications, integrations, or systems. Breaking changes involve removing existing functionality, altering existing functionality, or adding new requirements. These changes can lead to compatibility issues, causing frustration for end-users, higher maintenance costs, and even system downtime, thus undermining the trust and reputation of the software or service provider.
In contrast non-Breaking changes can add functionality or reduce requirements. Previously working code will continue to work with non-breaking changes. These changes allow software to evolve and grow without impacting existing users negatively.
Note: the term breaking changes within this policy only refers to REST API endpoints, Database changes and public Python APIs. As long as no breaking changes are introduced to REST API endpoints, Database, or public Python APIs, changes will not be considered breaking.
To summarize: Users of Conda-Store should be able to upgrade to newer versions without worrying that this will break existing integrations. Newer features will be available for users to adopt when they need to, but all existing code should continue to work.
Databases are one of the most critical areas to ensure there are no breaking changes. Databases hold state for the application and breaking changes to the Database can be destructive to data and prevent rolling back to earlier versions of Conda-Store. To avoid breaking older features, discipline needs to be exercised around database migrations.
In practice this means:
REST API endpoints need to be versioned this versioning should be done on a per endpoint basis. This will allow individual endpoints to be versioned independently of each other.
Endpoint routes without an explicit version will be assumed to use the latest version. For example:
If you have a example.com/api/user/v1
route and an example.com/api/user/v2
route, then example.com/api/user
will use the v2
handler for that route.
Non-breaking changes do not require a new version of an endpoint. Adding a parameter to the return value of an endpoint or making a previously mandatory input optional can be done without a new endpoint version. However, removing a parameter from the return value, altering the meaning of a value, or making a formerly optional parameter mandatory are breaking changes and would require a new endpoint version.
It is not necessary to backport nonbreaking changes to previous versions of endpoints.
Conda-Store will expose experimental features within the experimental
namespace.
For example, if a new version of the example.com/api/user
endpoint is being tested, but not yet considered stable, it can be made available at the example.com/api/user/experimental
route.
This allows Conda-Store contributers to test new changes and get community feedback without commiting to supporting a new version of an API endpoint.
Note: using the experimental
namespace is not mandatory. New endpoints and features can be deployed to existing endpoint versions for non-breaking changes and to new versions for breaking changes. However, deploying a versioned endpoint does mean a commitment to support that code going forward, so it is highly encouraged that developers use the experimental
namespace to test new endpoints and features before deploying them as stable.
Experimental routes have no guarantees attached to them, they can removed or changed at any time without warning. This allows testing features with users in real-world scenarios without needing to commit to support that feature as is.
Once we determine the routes are stable and want to support them, they will be moved into the existing latest version of the API endpoint for nonbreaking changes or a new version for breaking changes. If the route is an entirely new endpoint, it will start at v1
https://example.com/api/user/v1
https://example.com/api/user/v2
Explanation: This route has breaking changes between v1
and v2
. v1
will be stable, but clients must upgrade to v2
to benefit from new features.
https://example.com/api/user/v1
Explanation: This route has never had breaking changes introduced.
experimental
routehttps://example.com/api/user/experimental
Explanation: this is an experimental route. It can be changed or removed at any moment without prior notice. This route should be used to test new features and get community feedback.
It is not recommended to remove versions of API endpoints generally. Removing API endpoints, or versions of endpoints breaks backwards compatibility and should only be done under exceptional circumstances such as in case of a security vulnerability.
In the case of a removed endpoint, or endpoint version, Conda-Store should return a status code of 410 Gone
to indicate the endpoint has been removed along with a json object stating when and why the endpoint was removed and what version of the endpoint is available currently (if any).
{
"removalDate": "2021-06-24"
"removalReason": "Removed to address CVE-2021-32677 (https://nvd.nist.gov/vuln/detail/CVE-2021-32677)"
"currentEndpointVersion" : "v3"
}
Public Python classes, functions, methods, and so on are considered public APIs and subject to the same considerations as REST API endpoints. The only exception is names starting with a single underscore. This convention is used in the Python community to designate a class, function or method as private.
Private examples:
_private_func_or_method
_PrivateClass
_PRIVATE_VAR
.Public examples:
public_func_or_method
PublicClass
PUBLIC_VAR
.The highest-level entity determines the visibility level.
For example:
class _Private:
# everything is private here even without underscores
def this_is_also_private(self): pass
or
def _foo():
def inner():
# inner is also private - no way to call it without calling _foo, which is private.
This makes it clear which APIs are public and which are private.
Additionally, any classes, methods, or functions within the _internal
namespace are considered private by default. It is encouraged that developers use this for any internal logic.
The only exception to this is code in tests
. Tests are subject to change or deletion at any time and are never considered to be public code.
Any Class, function or method not prepended with a single underscore will be considered public. Developers are encouraged to make all classes, methods, and functions private by default and only expose them as public if there is a legitimate use case. Keeping Classes, functions and methods private by default limits the public APIs that the Conda-Store project developers are commiting to supporting, and
For all public classes, functions, and methods, breaking changes would be anything that changes the class, function, or method's signature or the meaning of a return value.
For a function or a method, this means that the arguements (inputs) and return values (outputs) must be the same. Internal logic may be changed as long as it does not change return values. Extra care should be taken when making these changes however. Changing the way a return value is calculated may result in subtle changes which are not obvious. For example, rounding a decimal versus truncating it may return different results even though the function signature remains the same.
The function signature also includes whether the function is a async function. Changing this is a breaking change.For example, if there is a function def list_envs
, which is syncronous, and it should be asynchronous, a new function called async def list_envs_async
should be added and list_envs
should be kept as a synchronous call.
Optional arguements may be added as long as they have a specified default value and additional fields may be added to return types if you are returning an object like a dict. These are considered non-breaking.
For a public class, this means that attributes and methods cannot be removed or changed. Rather than changing methods or attributes, new methods or attributes should be added. For example, if you wanted to change a user id from an int, to a uuid, a new attribute of User.uuid should be added, and all new code should use User.uuid. Existing code can use the new attribute as well as long as that doesn't introduce breaking changes into their signatures.
Clients should be able to upgrade without any additional changes.
Depreciated Classes, methods, and functions should have a comment in the code stating why they are depreciated and what to use instead. This will encourage developers not to use them without breaking existing code.
"This function is deprecated [reason/details], use [replacement] instead"
This also shall be communicated on the website and as part of release notes.
Chris, Chuck and Nikita will meet to discuss this and define a single path forward and report back the decision
--
Having a _internal namespace might help signal this.
I don't think this is flexible enough to be practical. conda-store-server uses modules but it's also structured as an application. Like, when I'm defining a route, I want it to be in the same file where all the routes are. If I need a helper, same deal. I think what I proposed is better: it's still explicit and works on any level of the hierarchy: modules, classes, functions, variables.
Chris shared a nice article on database schema versioning. TLDR: Add but never remove.
This article is from Datomic. Their whole thing is to never remove, which is why they are selling this. Again, I understand the value here, but again it seems extreme and not flexible. I think our policy should be to minimize changes and never remove, but in case we want to remove for some reason (too much space wasted, for instance), we want to have a policy.
I am proposing that rather than versioning the API as a whole, we version endpoints individually.
I dislike this for the same reason this is listed as an advantage. It doesn't help me understand what is compatible with what. See below where I talk about routes for an example.
I included the assumption that a "naked" api route will default to the latest version.
I'm against this. Clients will depend on this. Once we update, their stuff will break. There's no point of having this at all, they can just depend on the latest versioned API that's available during deployment. I don't understand the benefit of naked routes and it seems like they only create problems.
conda-store is an application that is intended to be used through the public REST API and it would be reasonable to only apply the backwards compatibility policy to that. Is there a use case where we expect users to interact with conda-store as a library?
conda-store's config can include arbitrary Python code. We already had an accident where a change in conda-store Python API (sync to async) broke Nebari Authentication class, which is what prompted this policy discussion:
The relevant issue is linked in the first message in this issue.
Conda-Store
FYI: the style guide says it's always lowercase.
If the route is an entirely new endpoint, it will start at v1
This is a problem I mention above, which is why I want to have versioned generations (AKA all compatible routes have the same version), not individual routes. After a while, we'll have a bunch of these routes using different versions, some of them might be incompatible.
Imagine I'm a client using the original v1 APIs that were added 5 years ago. Now I need to use some new route for some reason, which was added 1 month ago. It's also marked v1 because it's a new route. As a client, it's not possible for me to know that my code will continue working fine if I call that new API, because there's no info on what's compatible with what. Now, you could say that our code will never break client code, but I don't believe this. The whole point of this discussion is to have breaking changes and flexibility.
This also prevents us from removing routes, ever. I'm not saying we should remove routes, but I want to have this option to make the code maintainable long-term.
api/user/v1
Not sure if this is a mistake or deliberate. But we have a different versioning scheme right now:
api/v1/...
I like the current one better because it allows you to have any structure behind the version. Also, it doesn't require us redesigning our API scheme, which would be a breaking change, so this is a no-go.
Removing API endpoints, or versions of endpoints breaks backwards compatibility and should only be done under exceptional circumstances
Again, I don't agree with this. It seems not flexible enough for a real-world project. I don't want to remove anything, but I want to have an option to remove routes if I need to (for instance, when there's too much old code from 5 years ago that's likely no one is using -- I can just deprecate that code and move on).
Keeping things around is convenient and safe, but it does have a maintenance cost, too. If you have too much code, refactoring and updating it takes more time, etc.
Because this talks about removing routes in the context of vulnerabilities, this also doesn't address the question of deprecation (we need to inform the user before we remove the route). Since I want to have the ability to remove at will, I want deprecation to be documented as well.
Chuck, Chris, and I had a chat. We have agreed on a way forward.
I'm only going to address the things that we disagreed on before, based on my comment above.
/api/v1/something
. There will be no "naked/non-versioned" API. To see what's compatible with what, users will consult the release notes. But in most cases, routes should just be atomic. Things should work no matter what API you use.Give me a thumbs up below if you agree. Otherwise, add a comment.
Status update:
Tania says this also needs to be on Docusaurus.
@dcmcand Will take over to do the writing part, I'll review. Was discussed during the meeting today.
Status update: reviewed the initial draft of the policy submitted by Chuck in #687. The direction is good, but it needs some fixes. The file needs to be moved to the right place.
Status update: it was discussed during yesterday's meeting that we want to split public and private Python APIs to make things easier to maintain. To determine what will be public, we'll see what Nebari uses. Other things can be exposed as public later. @pavithraes wanted to link to an issue to track this. The BC policy needs to be updated to reflect this. CC @dcmcand
UPD: The BC policy doesn't need to be updated to reflect this because we already talk about private/public there. It's just that we might want to make code changes to not expose too much APIs as public on the Python side.
(Apologies for the notification noise, I closed this issue accidentally. I've re-opened it and reset the status to what it was before.)
Context
We need to have a backward compatibility policy.
This policy needs to specify what our public interfaces are. Example interfaces:
This policy needs to document how we make changes to those interfaces. Examples:
As an example, here's BC/FC policy for PyTorch: https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy
An example where a Python API change broke nebari: https://github.com/conda-incubator/conda-store/issues/593#issuecomment-1732244137.
Value and/or benefit
We don't break other software that depends on conda-store, like nebari. When we introduce changes, things are easier to debug because an error/warning is raised.
Anything else?
No response