edgedb / edgedb-go

The official Go client library for EdgeDB
https://pkg.go.dev/github.com/edgedb/edgedb-go
Apache License 2.0
167 stars 11 forks source link

Concurrent queries have to reconnect #314

Closed snoozedDev closed 1 month ago

snoozedDev commented 1 month ago

Describe the bug I'm running into this issue on a real world application where I have an API which processes requests concurrently and it's often the case that the endpoints will query data from my edgedb connection.

Reproduction I've created a minimal repository where I can easily repro the unwanted behavior: https://github.com/snoozedDev/edgedb-concurrency-repro

In the code, I spawn 8 go routines sequentially in groups of 2 with one second wait between groups. The amount I set the edgedb.Options.Concurrency to doesn't affect my test.

It can be reproduced with just 2 go routines, but I spawn 4 groups of 2 to show that it will keep happening during the lifetime of the program.

$ go run main.go

    Connected in 328.3618ms
    Client didn't have to reconnect (0s)
    Connected had to reconnect (320.3006ms)
    Client didn't have to reconnect (0s)
    Connected had to reconnect (321.6555ms)
    Client didn't have to reconnect (0s)
    Connected had to reconnect (318.9663ms)
    Client didn't have to reconnect (0s)
    Connected had to reconnect (316.6645ms)

Expected behavior

$ go run main.go

    Connected in 328.3618ms
    Client didn't have to reconnect (0s)
    Client didn't have to reconnect (0s)
    Client didn't have to reconnect (0s)
    Client didn't have to reconnect (0s)
    Client didn't have to reconnect (0s)
    Client didn't have to reconnect (0s)
    Client didn't have to reconnect (0s)
    Client didn't have to reconnect (0s)

Versions:

Additional context The aforementioned real world application is my first golang project (and first time using edgedb) so I might be just misunderstanding how things are supposed to work.

fmoor commented 1 month ago

The edgedb-go client currently only maintains at most 1 idle connection. In your example the first query in each pair uses the idle connection and the second query creates a new connection and then that second connection is dropped when the query finishes.

Have you deployed this somewhere or are you only running on your laptop? It seems the reconnect time is very hardware dependent:

$ go run main.go
Connected in 10.822818ms
Client didn't have to reconnect (49.382µs)
Connected had to reconnect (2.83533ms)
Client didn't have to reconnect (28.944µs)
Connected had to reconnect (3.408283ms)
Client didn't have to reconnect (31.398µs)
Connected had to reconnect (2.744962ms)
Client didn't have to reconnect (28.203µs)
Connected had to reconnect (3.151005ms)

edgedb.Options.Concurrency is the maximum number of connections the client will open. Since your example only runs 2 queries at a time it wouldn't have an effect unless you set it to 1. Ironically, for your example limiting the concurrency to 1 makes the query execution times faster because the one connection is reused for all queries.

I expect that the reconnect latency will become a much smaller fraction of your overall response time once you have production hardware and real queries and data. select 1 doesn't require any processing at all. If the reconnect latency continues to be an issue we can talk about making the minimum number of idle connections configurable.

snoozedDev commented 1 month ago

I'm just running this on my local development machine, which is a 12-core modern desktop computer.

I am very curious about that quick 10ms initial connection and those 3ms reconnects you're seeing. Are you running edgedb from the compose file? I wonder if having edgedb on a docker container is causing me some overhead because I'm seeing a very consistent ~315ms delay when connecting and reconnecting to it. I wouldn't have even noticed this behavior if my connection times were that low, so I'll be trying to figure out what my bottleneck is.

As for the amount of idle connections, It'd be great to have the ability to configure it. I don't see a downside of having it tied to whatever edgedb.Options.Concurrency is by default other than maybe saturating the edgedb server with multiple instances of the application.

fmoor commented 1 month ago

I am using the compose file in the provided repo. I am using linux instead of windows though. The first time I ran the example program it took ~340ms to connect the first time. Subsequent reconnects that first run only took ~5ms. Since that first time it only takes ~10ms to connect the first time and then ~5ms for reconnects.

If I use edgedb project init instead of docker I don't get the ~300ms at all.

saturating the edgedb server with multiple instances of the application

This is a very real problem at scale and one of the motivations for not allowing this to be configured. It is easy to shoot your self in the foot.

snoozedDev commented 1 month ago

Just booted into Linux to test this and I'm getting numbers very similar to yours, so It's most likely just be an issue with WSL/Docker/Windows. Probably Window's filesystem, as I suspect I'd be seeing high latency from just Postgres as well.

It is easy to shoot your self in the foot.

That's a fair concern, so maybe tying it to edgedb.Options.Concurrency is a bad default. I still think that having the option to increase the default number of minimum free connections can be valuable. However, how worthwhile this is depends on how catastrophic exceeding the maximum backend connections to edgedb is and how much work implementing such a configuration requires.

Knowing now how quick these temporary connections to edgedb are on a Unix system makes me feel less strongly about having such a configuration available to me. I'm fine with keeping it the way it is now until someone else comes in with a compelling use case for having more idle connections.

Thanks for looking into my issue! Feel free to close it unless you plan on adding this config.

fmoor commented 1 month ago

I'll close this for now. If there is more demand in the future we can reevaluate.