benashford / redis-async-rs

A Rust client for Redis, using Tokio
Apache License 2.0
253 stars 30 forks source link

What is our reconnect mechanism like? #90

Open leenstx opened 1 year ago

leenstx commented 1 year ago

Hi, I have some questions. I initialized a global connection using the once_cell crate, and I used it in two ways.

My test order is as follows:

  1. Start the Redis service.
  2. Run the test.
  3. View the console log.
  4. Stop the Redis service.
  5. View the console log.
  6. Start Redis.
  7. View the console log.

One way is like this (reconnect success):

#[cfg(test)]
mod test {
    use once_cell::sync::OnceCell;
    use redis_async::client::PairedConnection;
    use redis_async::error::Error;
    use redis_async::resp_array;
    use std::thread;
    use std::time::Duration;
    use tokio::join;

    static GLOBAL_CONNECTION: OnceCell<PairedConnection> = OnceCell::new();

    #[tokio::test]
    async fn test_redis() {
        let connection = redis_async::client::paired_connect("127.0.0.1", 6379)
            .await
            .unwrap();
        GLOBAL_CONNECTION.set(connection).unwrap();

        loop {
            thread::sleep(Duration::from_secs(1));
            let handle = tokio::spawn(async {
                let res: Result<String, Error> = GLOBAL_CONNECTION
                    .get()
                    .unwrap()
                    .send(resp_array!["SET", "test", "1"])
                    .await;
                if let Ok(_) = res {
                    println!("ok");
                } else {
                    println!("error");
                }
            });
            let _ = join!(handle);
        }
    }
}

One way is like this (reconnect fail):


#[cfg(test)]
mod test {
    use std::thread;
    use std::time::Duration;

    use once_cell::sync::OnceCell;
    use redis_async::client::PairedConnection;
    use redis_async::error::Error;
    use redis_async::resp_array;

    static GLOBAL_CONNECTION: OnceCell<PairedConnection> = OnceCell::new();

    #[tokio::test]
    async fn test_redis() {
        let connection = redis_async::client::paired_connect("127.0.0.1", 6379)
            .await
            .unwrap();
        GLOBAL_CONNECTION.set(connection).unwrap();

        loop {
            thread::sleep(Duration::from_secs(1));
            let res: Result<String, Error> = GLOBAL_CONNECTION
                .get()
                .unwrap()
                .send(resp_array!["SET", "test", "1"])
                .await;
            if let Ok(_) = res {
                println!("ok");
            } else {
                println!("error");
            }
        }
    }
}

I roughly understand that it's probably caused by multithreading. I'm new to Rust and I'm not sure if it's due to once_cell or redis_async. Can you provide some guidance? I'd also like to make some small contributions to redis_async.

benashford commented 1 year ago

I would expect you would see roughly similar things in both examples.

The library takes a "fail fast" approach to lost connections, this is because it was initially conceived with caching use-cases in mind. So if Redis was unavailable for any reason, there will be an instant error so that calling code can choose what to do. Attempting to use a connection that has been dropped will start a reconnection attempt as a spawned task, errors are still returned whilst that is happening rather than waiting. After a short-while the connection will be re-established and everything should continue working as before.

Calling code has two main options in this case, either: a) retry with a sensible back-off until the connection is re-established; or b) treat it as a cache miss and carry on. It will depend on the nature of the calling code as to which one of those two is preferable.

The error returned from .send(...) will explain the situation though, so it's worth logging that to check the specific error. You'll get one of these errors: https://github.com/benashford/redis-async-rs/blob/master/src/error.rs particularly the Connection(connection_reason) variant if the connection is disconnected and it's trying to reconnect. If the reconnection attempt fails (because Redis isn't running, for example) then you'll get the IO variant. So from that you should be able to see what was happening at that particular moment in time.