elastic / elasticsearch-rs

Official Elasticsearch Rust Client
https://www.elastic.co/guide/en/elasticsearch/client/rust-api/current/index.html
Apache License 2.0
702 stars 72 forks source link

[ENHANCEMENT] Request and Response types #75

Open Veetaha opened 4 years ago

Veetaha commented 4 years ago

Is your feature request related to a problem? Please describe. For some time I was using the TypeScript client for Elasticsearch. The major source of bugs and inconvenience is that the exposed client API is mostly untyped. When moving to Rust I was hoping that the strong statically-typed nature of the language will enforce creating a strictly-typed client in it. But as I see the client methods just use T: serde::Serialize which is frustrating.

The problem with this approach is that the API is so-called stringy, i.e. it is super-easy to mistype the JSON object keys, pass the JSON value of the wrong type or if the value is a string of a fixed set of possible values it is very easy to misspell the string enum. Besides that, you get poor developer experience from IDE, since you get no hints about the object shape that is expected, no completions and go-to-definitions to inspect the possible set of properties you can write, etc..

At the end of the day, such a stringy API just means that all the type-checks are moved from the compile-time to run-time, thus requiring extensive testing of your application, but it always happens that you forget to test some rare code-path where you e.g. misspelled the query object key and it gets to production and boom...

I know you understand these concerns and the decision on using the stringy JSON API in Rust was deliberate. Maybe, because it allows you to bootstrap the client with much less time and effort. I agree that it is a good short-term decision, it did let you create the crate very rapidly, didn't it? So maybe it's time to do more long-term design improvements?...

Describe the solution you'd like I don't have the ideal API proposal here. I'd like to hear your thoughts on that. I saw that you were interested in rs-es crate that does provide a good strongly-typed API, but unfortunately, this crate is likely unmaintained... Also, I'd like to note that it is important to not only define the input types to the client but also strongly type the response objects.

One thing that I'd like to warn you is the downside of static typing, anyway. I noticed it in diesel-rs (which has a very cool strongly-typed API which I'd like this crate to aspire).

E.g. If we implement the query builder it is necessary to ensure that it will allow for dynamic query building, i.e. let boxing the query builder like this is done in diesel in into_boxed()

Example of the problem:

impl BoolQueryBuilder {
    must() -> MustQueryBuilder;
    should() -> ShouldQueryBuilder;
}

let query = if cond {
    bool().must()
} else {
    bool().should() // this will fail because the types of branches are different
}

Describe alternatives you've considered As a crazy alternative, we can use macros for DSL so that we preserve the JSON-like syntax for building queries, but the macros will validate the query shape and raise a compile-time error if something invalid is encountered.

However, as to me, the API should expose something like a document object model (DOM) or abstract syntax tree (AST) which lets you safely create your requests. By now this crate already uses an AST, but this is the AST of the JSON language as a whole.

So at the high level, the goal is to narrow down the JSON language to Elasticsearch DSL JSON subset, which doesn't allow all the possible object shapes that are defined by JSON, but only the shapes that Elasticsearch server understands and works with.

Additional context Option::<AdditionalContext>::None

mwilliammyers commented 4 years ago

I would love to see a more strongly typed API layer on top of the one currently exposed. I am not convinced that I would want it to be the only API exposed though.

Something that we were trying to integrate in the elastic crate (which has now been effectively replaced by this crate, which is awesome :)) was a query/search builder API—something like this. I think that it is a good place to start.

Are you talking/thinking about something in addition to a strongly typed query builder as well?



Option::<AdditionalContext>::None

😄

Veetaha commented 4 years ago

@mwilliammyers

Are you talking/thinking about something in addition to a strongly typed query builder as well?

Yeah, the query builder part is, in fact, the most complicated one. It requires quite some clever design thinking in order to make it convenient in simple cases, but also powerful enough to support dynamicity. Maybe it's possible that we can study from diesel-rs experience in this sphere (see a nice talk about its type system utilization)

Nevertheless, the small step to start with is adding Response types for each API method. This is mostly just writing out the types of the objects returned from Elasticsearch server as specified in its API protocol (e.g. see the response types declarations for already deprecated old JavaScript client (the new TypeScript client does lack them for some reason, but there is an already existing PR to add them)). The work needed to do this is mostly routine and very simple, but also quite time-consuming taking into account the size of Elasticsearch API.

I'd like to emphasize the word mostly very simple. The response types for some methods can return different shapes of objects depending on the data stored in Elasticsearch itself (e.g. the documents the user wrote there), so such types should be generic over the user document types.

Going further, there are cases, where the response shape can depend on the input to the API method, e.g. passing the scroll parameter to _search returns a response with guaranteed _scroll_id property. To encode such a dependency in the type-system such that the user is not required to use unwrap for the response this method should return not Search but the new (say) SearchScroll builder in this function: https://github.com/elastic/elasticsearch-rs/blob/6785d5f45df1d56aa6efef8b78d25a2f113b5909/elasticsearch/src/generated/root.rs#L1497-L1501 Such that we get the following static consistency:

// generic T for the type of returned document
// If this crate was an elaborated ODM (object-document mapper) like diesel, this could be
// inferred, but let's not go that far as I understand that this is not what this crate is
struct SearchResponse<T> {
    /* search response fields without _scroll_id */
}
struct SearchScrollResponse<T> {
    #[serde(flatten)]
    search: SearchResponse;
    scroll_id: String,
}
impl Search       { async fn send<T>() -> SearchResponse<T>; }
impl SearchScroll { async fn send<T>() -> SearchScrollResponse<T>; }

// such that we get:

client.search().send().await?.scroll_id // compile-time error, SearchResponse doesn't have scroll_id
client.search().scroll("2m").await?.scroll_id // alrigth, SearchScrollResponse has scroll_id

I understand if you are reluctant to adopt the proposal present in this example, but at l least giving some more general type (with scroll_id: Option<String> like in this case) would already be a big jump from the current state of untypedness. Also, the builder you referred is an already awesome starting point, it will be great if it was copied over to this crate!

russcam commented 4 years ago

Thanks for opening, @Veetaha. Just wanted to drop a quick note to say that I've read the issue; It warrants a much longer answer than I can provide at this moment, but I am looking to respond later this week.

russcam commented 4 years ago

Here's the longer answer 😃

We would like to eventually provide types for request and responses, but don't have a definitive timeline mapped out for it right now. This issue can serve as the basis for coordinating thoughts and actions around the topic.

I genuinely appreciate that a lack of request and response types make the client more onerous to work with. There is much complexity involved in building requests and response types in certain places however; Amongst other things, I also work on the C# client, which exposes request and response types for all APIs for the high level client, and can confidently say that request and response types is a large part of the client work involved, because they can't currently be generated. There is effort going in to improving this, which I think might be a good starting point for tackling this issue once its in a good position.

In answer to the points raised

I know you understand these concerns and the decision on using the stringy JSON API in Rust was deliberate. Maybe, because it allows you to bootstrap the client with much less time and effort. I agree that it is a good short-term decision, it did let you create the crate very rapidly, didn't it? So maybe it's time to do more long-term design improvements?...

I understand the point made, although I'm not sure I would classify it as a stringy API - it's more a serde::Serialize API, so you can use json!{} or in the absence of request types provided by the client currently, your own structs for the requests and APIs that you use.

The common convention that official clients follow (for languages where it makes sense) is to have a low-level client and a high level client. Typically, the high level client may build on top of functionality in the low level client. What we're focused on at the moment is building out that low level client part in order to get the client in a position for a GA release, providing a good foundation on which to build high level client components like request and response types .

I saw that you were interested in rs-es crate that does provide a good strongly-typed API, but unfortunately, this crate is likely unmaintained...

rs-es looks like a reasonable start on providing types, but I don't think they're complete sadly.

Yeah, the query builder part is, in fact, the most complicated one. It requires quite some clever design thinking in order to make it convenient in simple cases, but also powerful enough to support dynamicity. Maybe it's possible that we can study from diesel-rs experience in this sphere (see a nice talk about its type system utilization)

Queries and aggregations are definitely some of the more complicated parts, in addition to being the most often used! Thanks for the link on diesel-rs, really good talk 👍

Nevertheless, the small step to start with is adding Response types for each API method.

I think that the approach we might want to take here is to allow response types to be implemented gradually, with the ability to use a response type where it exists, otherwise use a more generic response type. One way would be to tie an API's request and response types to specific request/response struct implementations, but this might be too limited in terms of flexibility if

  1. mistakes happen and incorrect types are used in request/responses
  2. things like filter_path are used, which filter returned JSON to a subset

An alternative approach might be through the generic response.json::<SpecificResponse>() for a given API, with some trickiness as to how SpecificResponse is tied to the response of that given API.

mwilliammyers commented 4 years ago

I wonder if it would make sense to just add some types from elastic/elastic-rs for now, e.g.:

and users can optionally use them?

We could also just pull those types into a separate repo/crate and iterate on them there.

In the end though, I think it would be best to automatically generate them. @russcam Is that possible? I have noticed things like this in the elastic docs and wonder if those are coming from some source (JSON/YAML etc.) we could use? (I haven't looked into it too deeply...)

Edit: looks like I found the REST API specs: https://github.com/elastic/elasticsearch/tree/master/rest-api-spec and it doesn't look like they include response type definitions

Veetaha commented 4 years ago

Totally agree with mwilliammyers, adding at least some basic response types would be very helpful to prevent users manually writing them :sweat_smile:. They try to do the same thing at TypeScript client repo.

russcam commented 4 years ago

In the end though, I think it would be best to automatically generate them. @russcam Is that possible? I have noticed things like this in the elastic docs and wonder if those are coming from some source (JSON/YAML etc.) we could use? (I haven't looked into it too deeply...)

Being able to generate them would be the ultimate goal that we should aim to get to. There is effort going in to this, which I think might be a good starting point for generation, once the generator project is in a good position.

I have noticed things like this in the elastic docs and wonder if those are coming from some source (JSON/YAML etc.) we could use? (I haven't looked into it too deeply...)

As far as I am aware, these are manually created. These could be another artifact generated from the generator project in the future.

Edit: looks like I found the REST API specs: https://github.com/elastic/elasticsearch/tree/master/rest-api-spec and it doesn't look like they include response type definitions

The api_generator package in this repository uses the REST API specs to

  1. generate the parts of the client that can be generated, from the specs
  2. generate integration tests for the Elasticsearch YAML tests
mwilliammyers commented 4 years ago

If the community would benefit from it, I can open source the code my company uses for queries, aggregations and request/response types. It is fairly complete but a good portion of it is slightly different from the exact structure from Elasticsearch to suit our needs/we use graphql.

Mostly these slight differences show up in aggregation responses and the search DSL where instead of:

{ “term”: { “somefield”: { “value”: “asdf” } } }

We have:

{ “term”: { “field”: “somefield”, “value”: “asdf” } }

Because graphql doesn’t really support dynamic keys/arbitrary JSON.

I have been thinking about trying to refactor that code out into a library for a while now...

mwilliammyers commented 4 years ago

I am working on https://github.com/mwilliammyers/elastiql ... some of it is a little bespoke but it should be pretty handy. It is fairly well fleshed out, but still missing some things and is a little rough around the edges...

GopherJ commented 3 years ago

Hi, I think we shouldn't mix QueryBuilder with strongly typed Response, query builder is a much larger topic and in many cases it's not so useful. For example when the query is created dynamically on client side using UI.

However, strongly typed Response is pretty necessary or even we can say a must for most of the users, it's pretty important for elasticsearch-rs.

mwilliammyers commented 3 years ago

I agree, the repo I posted also has a strongly typed response. I think something very similar to that should be added to this crate, before tackling a QueryBuilder.

russcam commented 3 years ago

Hi, I think we shouldn't mix QueryBuilder with strongly typed Response, query builder is a much larger topic and in many cases it's not so useful. For example when the query is created dynamically on client side using UI.

There are many queries that cannot be expressed simply from the client side, where an ability to build a query is crucial for a great developer experience.

I completely understand the need for request/response types, but I'm very keen to avoid manually writing them. Not only is it a large amount of effort to write and maintain them, but there's a chance that a manually written type would have a different signature to one generated from a spec, meaning that once types are generated from a spec, there could be breaking changes moving from manual types to generated ones. This is perhaps acceptable for an alpha crate, but then it does place additional constraints on when it'd be possible to make the crate a GA release. Client major versioning aligns with Elasticsearch major versioning so were a manual type introduced, the crate made GA, and then a generated type later created with a different signature, it creates additional constraints on how types align to API functions and possibly when a manual type can be removed and superseded by a generated type i.e. it may not be possible to do this until the next major version of the crate, which would be dictated by the next major version of Elasticsearch.

The generator project is progressing well, with prototypes of generators for Java, C# and Typescript. It should be possible to start looking at a generator for Rust types in the near future.