eosnetworkfoundation / engineering

A workspace for documentation by Engineering primarily regarding process
MIT License
0 stars 0 forks source link

Versioned HTTP APIs #25

Closed kj4ezj closed 1 year ago

kj4ezj commented 1 year ago

See also: Leap issue 730.

Leap or nodeos HTTP APIs have always been a second-class citizen. We put time and money into them, indicating we believe they meet a customer need and are important. However, we do not create reliability and stability by treating the HTTP API as a product with stakeholders, a release schedule, and versioning.

For example, we recently commissioned work to change the type of compression used for one API call, and another piece of work to change the HTTP status code in the response from another API call. These were changed in-place, potentially breaking the applications of downstream consumers.

Compare this to the strategy of a normal public-facing product API such as the AWS API, Ethereum JSON RPC APIs, or really everyone else. These service providers plan their API interfaces ahead of time, develop them according to this strategy, freeze them, and release them. Changes such as the HTTP response code or data types used to deliver responses or fields of responses are not allowed. They must be planned ahead of time and changes such as this require a new API version that is released.

Eventually, older API versions are deprecated, customers are encouraged to upgrade their clients, and the old version is removed according to a schedule. Some API providers do allow new API "routes" to be added to an existing version because this does not break backwards-compatibility with clients. Most allow bug-fixes that are backwards-compatible. However, changes to existing routes are essentially non-existent.

We should follow a strategy like this to improve reliability, stability, documentation, and enable downstream consumers of our API to plan ahead.

ericpassmore commented 1 year ago

The APIs are ok. Keep in mind most of the blockchain commands are basic sending transactions, or getting systems information. In addition, the leap surface area for APIs does not include other API products like graph queries or caching. Breaking updates to APIs should be done in a thoughtful way with the support of versions. Since we don’t have versions for APIs this can be a problem. We do have versions for schemas. Schema versioning should include serialization/deserialization.

The place to start would be inventorying all the calls, the HTTP return codes, the schema for the response, along with a description of code location for API calls. Looking across those API we could start to ensure consistency and build API testing to enforce the consistency we desire. As part of the consistency effort we should seize any opportunities to better organize the client code for speed of development and increased quality of development.

ericpassmore commented 1 year ago

Shifting to a product mindset for APIs (standards, versions, and release cycles) would create a firm foundation for future upgrades. For example adding a graph query API, raw block API, or moving table lifecycle to an API would benefit from a product mindset.

Currently the biggest hurdle for development is serialization/deserialization. First, developers are trying to implement clients in different languages https://github.com/AntelopeIO/abieos/issues/14 , and they often reverse engineer the deserialization. Second transactions often need to be relayed to a different device for signing, and the routing of the request may require inspection of the transaction. An example is signing a transaction on a mobile wallet for an action initiated in a web application. To make this “relaying” work, clients deserialize, tokenize, inspect and pass along the transaction. The burden of deserializing and tokenization is significant in both developer time, package size, and cpu cycles.

How can we align on the problem statement and impact of deserialization-tokenization, and serialization? Does this problem warrant a major rework of our APIs (e.g. new API with backward forwards compatibility for ABIEOS and Protobuf). What is the best way to support deserialization, tokenization, serialization (layer 2 API, client libraries, embedded duplicated responses)? Would we add a Raw Transaction, Raw Header, Raw Block API as well?

ericpassmore commented 1 year ago

Last area of impact is developer tooling. This would include advanced query APIs (example graph query), caching support, or encapsulating standard contracts into API calls. Most likely this would start as standards on how these APIs should be structured, with example code to help 3rd parties build APIs covering new areas.

wanderingbort commented 1 year ago

As a point of process, I'd like to note that the affirmative outcome of @kj4ezj's proposal would be a recommendation to product from engineering that they productize the API rather than leaving it in its defacto state of being an "implementation detail".

I suspect product would be very receptive and this should not be construed as me pushing against this effort. Rather, I just want to acknowledge that in this context we don't have complete authority to make this move and we will need them to buy in.

ericpassmore commented 1 year ago

Started documenting options for version management in separate branch https://github.com/eosnetworkfoundation/engineering/blob/ehp/proposals/proposals/HTTP-Version.md

ericpassmore commented 1 year ago

Decisions

#Rec Switch to JSON RPC.

Resolves the identified problem's . What other blockchains are using.

No Versions

#Rec Retain a leading directory name of api to support URL namespace reorganization

Drop the version id. API is HTTP/2.0.

URL Organization

Needs additional research to match the needs. Would organization need additional information in the URL to make layer 7 routing easy, specific cases would be routing to types of notes, and caching.

Schema Version

#Rec Use JSON RPC. JSON payload includes method name. This is the schema version. If we want make breaking changes, we change the method name.

Serialization Version

No Change.

Decision, should we open this as a future engineering issue for discussion?

Supporting If-Match

#Rec Support E-Tag and If-Match for producer configuration.

ericpassmore commented 1 year ago

Next steps: come back with Survey of APIs from other cypto chains and common API services

ericpassmore commented 1 year ago

SURVEY

The Blockchains I looked at do not have an HTTP Version. They do have a schema/rcp version. They do not support E-Tags. Here are references for Json RCP used by the blockchains

Amazon suggests using the host name of the DNS entry as the version. This enables folks to route at a high level, and consolidates mappings into the fewest locations. Example https://apiv1.example.com to https://apiv2.example.com. Blockchains don't control DNS and don't have this option. Amazon Docs

Looking at blockchains they use jsonrpc instead. This is not an HTTP API Version. Don't be confused by the version inside each JSON RPC call. That is actually the version of the schema, and indicates the version of both the fields in the schema and the related behaviors.

See examples below for blockchains. In all cases, JSON schema version is jsonrpc: 2.0. https://ethereum.github.io/execution-apis/api-documentation/ https://docs.near.org/api/rpc/introduction https://docs.avax.network/apis/avalanchego/apis/issuing-api-calls

HTTP API

The HTTP API should be used for the URL organization, standardization of return codes, and standardization for rarely used methods like PATCH. The relationship between the HTTP API and the json schema is limited. It is true to say the HTTP API requires the schema to have a version.

Looking at AVAX's complex organizational structure, and lack of version, they have a significant problem if they want to re-organize or reshape their URLs, as they lack a distinguishing version key. Having a jsonrpc value does not make sense if the URL patterns overlap, across 2 versions, while the schemas are significantly different. Having the same URL with different schemas leads to bad coding patterns for clients as they put if-statements deep in the code. You can't set the version at the beginning of a session because the version is schema specific and changes depending on what you want to do. On the other hand, having an HTTP API version enables setting a value at the start of the HTTP session, and driving logic off that key.

Schema Version

Blockchains use jsonrpc as for the schema version. Across ETH, AVAX, and NEAR they all use the same version string in all cases jsonrpc: 2.0. Personally I question the usefulness of a version tag when it is always the same (aka static value). This indicates the method name is changing, and the jsonrpc field isn't used. On a separate point, if a strict version management was needed then SOAP is the way to accomplish this. SOAP died because it was so hard to use. In practice schemas have a looser treatment

We tend to care only about the Mandatory changes. Clients do not validate schemas. Mandatory changes are rare, and they tend to have a clear purpose. For these reasons, recommend changing our method names to support breaking changes.

JSON RPC used a method field to manage this.

Schema naming is personal preference. This conversation we have introduced the following options for schema naming

Content Versioning

If you really care about optional fields and content validation, use E-Tags and Directives. The E-Tag along with the Directives can tell you if the content has changed. For example if you request information about a block and get an E-Tag that E-Tag will change when new optional fields are returned. Comparing E-Tags allows clients to understand optional changes in returned content.

In closing, breaking changes from schema naming are the most important. E-Tag and Directives are the solution for optional changes. HTTP API Version should be limited to URL structure, standards for HTTP Response codes, and standardizing behaviors for rarely used HTTP Methods (i.e. PATCH).

kj4ezj commented 1 year ago

In my experience, the E-tag is usually a hash specific to one session which enables a client to request paginated data in chunks over a period of time instead of having to ingest a large amount of data all at once. By including the E-tag in the page requests, the client can ensure they are able to eventually parse the entire body of content as of the snapshot in time where the first request was made without being served duplicates or having missing pieces because the content has changed between queries.

Since our content is changed atomically using discrete, batched steps (blocks) and the state is deterministic rather than being changed continuously over time, I think it would save server resources to associate this e-tag with a specific block hash (ID) as opposed to the e-tag being a hash the server has to associate with a specific client's session and remember for a period of time. Stated another way, this "snapshot in time" is already built-in to the concept of a blockchain and we should not implement it again in another layer. This provides several advantages.

For one, we do not have to cache any queries associated with specific sessions. The page of a request for a specific query at a specific block height is deterministic and can be generated instead of stored. Another benefit is that a client could begin a paginated query with one service provider, then finish it with another service provider if the first goes down. This also makes load-balancing and highly available API endpoints much easier to implement, despite the fact that public clouds already offer patterns to solve some of these problems in a session-aware context. Finally, there is no time-limit or timeout for a client to resume queries with a specific e-tag.

Hopefully this makes sense.

ericpassmore commented 1 year ago

The E-Tag recommendation is scoped to the producer plugin configuration. Your right that E-Tags don't make sense on a moving target of constantly updating values. There might be some chain API methods that would benefit from E-Tags like get_account.

E-Tags and Directives are last on the list and play a supporting role in the versioning picture.

ericpassmore commented 1 year ago

Decision: api is the leading path dir in the URL Next Steps: Eric to update documentation for clarity on API version is HTTP/2.0 Next Steps: Eric to work with product to identify http routing needs

ericpassmore commented 1 year ago

archiving older issue