bikeshedder / deadpool

Dead simple pool implementation for rust with async-await
Apache License 2.0
1.08k stars 137 forks source link

deadpool-sync::SyncWrapper is broken #325

Closed maan2003 closed 5 months ago

maan2003 commented 6 months ago

https://github.com/bikeshedder/deadpool/blob/2a584a99fa6d80921f6428b7c5e7e6dc06b8e0a9/sync/src/lib.rs#L111-L122

Line 122 can panic because drop's task can get the lock first and make the value None

https://github.com/bikeshedder/deadpool/blob/2a584a99fa6d80921f6428b7c5e7e6dc06b8e0a9/sync/src/lib.rs#L155-L162

maan2003 commented 6 months ago

for searchablity: panicked at deadpool-sync-0.1.2/src/lib.rs:122:43: called Option::unwrap() on a None Value

maan2003 commented 6 months ago

correct behavior would be to acquire lock before the spawn_blocking in interact(), but that requires async mutex which breaks the public api

bikeshedder commented 5 months ago

The InteractError::Aborted is currently unused. I don't remember exactly why I added this variant to begin with.

It should be a great way to fix this issue in a way that doesn't result in a breaking change:

--- a/sync/src/lib.rs
+++ b/sync/src/lib.rs
@@ -119,13 +119,13 @@ where
         self.runtime
             .spawn_blocking(move || {
                 let mut guard = arc.lock().unwrap();
-                let conn = guard.as_mut().unwrap();
+                let conn: &mut T = guard.as_mut().ok_or(InteractError::Aborted)?;
                 #[cfg(feature = "tracing")]
                 let _span = span.enter();
-                f(conn)
+                Ok(f(conn))
             })
             .await
-            .map_err(|SpawnBlockingError::Panic(p)| InteractError::Panic(p))
+            .map_err(|SpawnBlockingError::Panic(p)| InteractError::Panic(p))?
     }

This is less invasive, doesn't change the API but should still resolve the issue, doesn't it? :thinking:

Were you able to write a unit test that triggers the panic? I'd really like to make sure this code stays panic free in the future.

maan2003 commented 5 months ago

I think upstream code will just do on interact().unwrap()

example:

https://github.com/matrix-org/matrix-rust-sdk/blob/5d549f17146907980bea8fe89a5664dc6365a73f/crates/matrix-sdk-sqlite/src/utils.rs#L118

Were you able to write a unit test that triggers the panic?

No, this is very racy and hard to repro consistently.

bikeshedder commented 5 months ago

I think upstream code will just do on interact().unwrap()

It's actually interact(...).await.unwrap().

The await is the important bit here. As long as the future which called SyncWrapper::interact is around it is also guaranteed that the SyncWrapper will not be dropped.

If I'm not mistaken the only way this error can actually be triggered is this situation:

  1. Future A calls SyncWrapper::interact which returns Future B
  2. Future B spawns task C
  3. Future A is aborted which aborts Future B as well but task C keeps running
  4. The object is returned to the pool but ends up being dropped (e.g. during a Pool::resize or Pool::close)
  5. SyncWrapper::Drop is called.
  6. Task C starts running and finds a None stored inside the SyncWrapper and panics.

If task C were to return a InteractError::Aborted it would just end the Task and as Future A and B are no longer around they wouldn't care.

Unless my reasoning is completely off it should be totally fine to return Err(InteractError::Aborted) as it is guaranteed to never reach the caller of the method. The existence of the caller is preventing this error to happen in the first place.

I tried reproducing this behavior in a unit test by using the SyncWrapper directly but it is a tricky thing as it needs to be polled until the task is spawned and dropped before it actually runs.

maan2003 commented 5 months ago

agreed

bikeshedder commented 5 months ago

I just release deadpool-sync 0.1.4 including this patch:

maan2003 commented 5 months ago

BTW this error happen in 0.1.2 not in 0.1.3

bikeshedder commented 5 months ago

BTW this error happen in 0.1.2 not in 0.1.3

Errr... that's even more weird as the only difference between those two versions is the inclusion of the LICENSE files. :clown_face:

maan2003 commented 5 months ago

I don't mean the issue didn't happen in 0.1.3, we just haven't used 0.1.3 in our client