edgedb / edgedb-rust

The official Rust binding for EdgeDB
https://edgedb.com
Apache License 2.0
208 stars 26 forks source link

pgvector support in static type mode #254

Open tailhook opened 1 year ago

tailhook commented 1 year ago

The Problem

In Rust Vec<f32> is the same type that can be used both for Array<f32> and pgvector. But our current code assumes single Rust type == single EdgeDB type.

More technically Queryable trait has two methods:

  1. check_descriptor that works on the type descriptor and checks whether the type returned from EdgeDB is of the expected type
  2. decode that actually decodes the value

There is currently no way to pass information about the actually used type from check_descriptor to decode. This makes sure that decoding is fast, but is bad for this case.

Background

Dynamic in Static Type

We have the same issue when dynamic type is used within the static type:

#[derive(Queryable)]
struct MyResult {
  name: String,
  link_to_a_dynamic_object: Value,  // Dynamic type here
}

This doesn't work in the current implementation too. And we want this to work.

How Dynamic Type Works?

We have a QueryResult trait that allocates Arc<dyn Codec> objects and passes codes around. This only works for top-level thought. And using it thorough the normal code is costly.

Possible Solutions

(1) Allow Passing Data

Perhaps check_descriptor should return associated type that stores state. This means all the decoders and the derive should be updated.

Also I'm not sure what is the performant enough way of doing that. Since passing around allocated structures will make a lot of overhead.

But this is probably the best solution long term. Especially given that it potentially fixes dynamic type in the static type problem. (And may also help getting rid of QueryResult vs Queryable distinction).

(2) Return Dynamic Type

Make check_descriptor() return some Arc<dyn StaticCodec> value that decodes the value instead of state. This is pretty similar to the solution above, but adds virtual calls overhead, although probably makes dispatching straighforward (i.e. either pgvector or array codec is returned at check stage instead of checking some state at each value decoding), usually virtual calls are considered slower. And it also inherits all the other downsides of the solution above.

(3) Use New Type to Disambiguate

I.e. for vectors user have to use:

#[derive(Queryable)]
struct MyResult {
  vec: edgedb_protocol::model::Vector,
}

This might be not very convenient for some use cases. Although, implementing Deref<Target=Vec<f32>> might help.

The upside is probably that we can write a special fmt::Debug impl (that has ellipsis instead of dumping 1000 of numbers).

(4) Use Attribute to Disambiguate

Example:

#[derive(Queryable)]
struct MyResult {
  #[queryable(vector)]
  vec: Vec<u32>,
}

The downside of this, is that it works only for shapes. Returning just vector from query will nor work, as well as arrays or tuples of vectors will be problem.

Conclusion

Options are not mutually exclusive. But different combinations of them have their own downsides, including introducing multiple ways to do such routinely simple task.