Ericsson / ered

An Erlang client library for Valkey/Redis Cluster
MIT License
13 stars 7 forks source link

Avoid copying between processes #3

Open zuiderkwast opened 2 years ago

zuiderkwast commented 2 years ago

Minimize the copying of data between processes. Do as much as possible in the calling process.

drmull commented 2 years ago

One idea is to make a separate benchmark repo where we test ered vs eredis_cluster and maybe include other erlang redis cluster clients.

As we have discussed before, it would be interesting to try out some more optimized way of of doing things. One thing we could do is to remove the ered and ered_client process and instead use atomics for the slot map lookup, persistent term for the connection lookup and the counters module for keeping track of the queue size.

slot -> connection index (atomics) connection index -> queue size (counters) connection index -> connection pid (persistent term, local pid fits in a word so no global GC to update)

The connection module would have to handle reconnect and status reporting to the cluster module. The queue would be the connection send process message queue. Avoid gen_server:call since setting up the link is expensive, rely on a timeout instead.

Not sure if it will work, there might be a catch, but if it works I think it would be quite efficient.

zuiderkwast commented 2 years ago

Benchmarking is a good idea. We should include ecredis in the comparison.

Atomics and counters are probably good, but I'm not sure about persistent term. It's true that replacing a pid doesn't trigger a global GC, but it still rewrites the whole persistent term table, which may contain stuff out of control of this lib. Perhaps an ETS table is an acceptable choice for connection index -> pid lookup?

Avoid gen_server:call since setting up the link is expensive, rely on a timeout instead.

You mean gen_server:call's monitor is expensive? With timeout you mean we use cast + receive after?

drmull commented 2 years ago

but it still rewrites the whole persistent term table, which may contain stuff out of control of this lib.

Yes you are right, I did not realize the persistent term table was global. Better not go that way.

You mean gen_server:call's monitor is expensive?

Yes, I meant monitor. I remember it showed up when I did some profiling and bang/cast + receive performed better. It is hackish but might be worth if we are going all in for speed. At least we could profile it and see if it makes any difference.

ghost commented 2 years ago

Perhaps an ETS table is an acceptable choice for connection index -> pid lookup?

A process dict might also be an option, it has the same lifetime as ETS tables (dies with the owner process). If doing only simple key lookups it should be faster than ETS I guess.

zuiderkwast commented 2 years ago

Ideally the lookup should happen in the caller (user's) process before the first gen-server call. We don't want to pollute the process dictionary of the user's process.