Burgestrand / spotify

Low-level Ruby bindings for libspotify, the official Spotify C API
https://rubygems.org/gems/spotify
Other
146 stars 17 forks source link

Garbage collection can cause deadlocks #23

Closed Burgestrand closed 10 years ago

Burgestrand commented 10 years ago

@ike18t reported an issue via mail about spotify causing deadlock, and found a fix:

I think I finally figured out why spotify kept freezing for my app. I noticed all of the (track|artist|user)_release getting logged from your gem. Little did I know that there is a free method on the objects to release the memory and using it once I'm done with the object seems to prevent all of the auto releases which seemed to occasionally cause a freeze if the player tried to start before the releases finished.

Now, there must be a bug in the Spotify gem here, so here’s what I speculate:

It has never occurred to me that this kind of race condition might happen, but now that I think on it I’m fairly certain I know what happens. Ruby (MRI) runs garbage collection in a separate thread, but it’s probably so that no other Ruby code runs while garbage collection is happening, so all code is halted.

In another area, I have a lock around the Spotify functions. It will be locked prior to calling any Spotify function, so no more than one thread at a time may call a Spotify function because libspotify is not thread-safe.

Now, assuming you call a Spotify function (any function, but the longer it takes the higher chances of the freeze happening), and garbage collection happens to trigger right after your program acquires the Spotify lock, the lock will still be locked while garbage collection is running. Now, when garbage collection tries to free a Spotify pointer, it will attempt to acquire the Spotify lock, but it cannot because another thread is already holding the lock.

After this we have our main thread that holds the lock which will not release the lock until it runs again. Unfortunately, Ruby will not let our main thread run until garbage collection is finished. Garbage collection cannot finish because it is waiting for the lock hold by the main thread.

I believe I can fix this by creating a reaper thread. It will spin in the background forever, just waiting for a queue of pointers to be freed. In the finalizers for each pointer I will add the pointer address to the reaper queue, and they will be scheduled for release after garbage collection has successfully run. This should allow garbage collection to finish, which would yield control back to the main thread, which in turn will release the lock which allows the reaper thread to perform garbage collection!

Burgestrand commented 10 years ago

Just noticed that this also applies to Subscribers structs, as they are automatically freed on garbage collection as well!

Burgestrand commented 10 years ago

Reaper thread will have one downside: the thread will be forcefully killed when the ruby VM quits, but there might still be objects around that need to be garbage collected that no reaper can take care of. Torquebox for example, boots several VMs, and shuts them down when doing a redeploy which would cause us to leak currently live spotify objects.

However, thinking about this again the Spotify gem cannot support this use-case either way. libspotify does not properly support closing sessions and creating new ones. I’m not sure it’s possible to support the use-case of multiple ruby VMs in the same process either way… and it’s an edge-case.

Burgestrand commented 10 years ago

Doing this work in the reaper branch at the moment. It’ll be deleted once merged with master.

Burgestrand commented 10 years ago

I think what I have should work well. I’m a bit worried about Atomic using the fallback implementation, in which case there’s a chance for deadlocks on GC again.

I’ve been investigating doing a lockless implementation without Atomic, which should be possible consider I have only one consumer, and only one (I believe) producer.

Burgestrand commented 10 years ago

A lockless implementation should be possible with a linked list. Head is consumed from by the Reaper, and the tail is appended to by the GC thread. This will only work if finalizers are never run concurrently, but this needs to be examined or it will not be safe.

Burgestrand commented 10 years ago

I have an implementation using Atomic now in master, but I’m going to try and replace it with a lockless linked list instead before pushing a new gem version.

Burgestrand commented 10 years ago

Note to self: make Reader#mark private, so nobody can call it by accident. That would be bad!

Burgestrand commented 10 years ago

Another note: remove restriction on Reaper#terminate being zero.

Burgestrand commented 10 years ago

9b4304aef4225ba1d6227d6269dd5501bd02a196 should be good.

ike18t commented 10 years ago

Good job man!

Burgestrand commented 10 years ago

It does however look like object finalizers, who are responsible for queuing objects for GC, can be run concurrently. In MRI, finalizers appear to run in the same thread as they were created, but in JRuby and Rubinius finalizers will run in a thread designated specifically for garbage collection.

In effect, Reaper#mark needs to be thread-safe, but it is not at the moment.

I also was not able to produce a case in either of the three implementations where the deadlock occurs using just locks. It’s probably a thread-safety issue with the interaction with libspotify that causes the deadlock. I’ll reopen this until I can find the deadlock properly, so I may confirm that it is gone.

Burgestrand commented 10 years ago

@ike18t this issue was a long while ago, but I believe I have another way of tackling this problem instead of using the reaper thing that I wasn't very happy with. If you're still using your custom player, I would be very happy if you could try your code with the latest change from the master branch and see if the deadlocks have disappeared, without freeing pointers manually!

I would test it myself, but I have not been able to trigger the deadlock myself.

ike18t commented 10 years ago

Sure dude! Pulling it now.

Quick question: Is the monkey patch you sent me a while back to get plaything to work better with the raspberry pi still required? If so, that's cool and I'll just leave it.

On Wed, Jun 4, 2014 at 2:51 AM, Kim Burgestrand notifications@github.com wrote:

@ike18t https://github.com/ike18t this issue was a long while ago, but I believe I have another way of tackling this problem instead of using the reaper thing that I wasn't very happy with. If you're still using your custom player, I would be very happy if you could try your code with the latest change from the master branch and see if the deadlocks have disappeared, without freeing pointers manually!

— Reply to this email directly or view it on GitHub https://github.com/Burgestrand/spotify/issues/23#issuecomment-45056901.

Burgestrand commented 10 years ago

Yes I think so. Thanks for the help!

On Wednesday, June 4, 2014, Isaac Datlof notifications@github.com wrote:

Sure dude! Pulling it now.

Quick question: Is the monkey patch you sent me a while back to get plaything to work better with the raspberry pi still required? If so, that's cool and I'll just leave it.

On Wed, Jun 4, 2014 at 2:51 AM, Kim Burgestrand <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@ike18t https://github.com/ike18t this issue was a long while ago, but I believe I have another way of tackling this problem instead of using the reaper thing that I wasn't very happy with. If you're still using your custom player, I would be very happy if you could try your code with the latest change from the master branch and see if the deadlocks have disappeared, without freeing pointers manually!

— Reply to this email directly or view it on GitHub https://github.com/Burgestrand/spotify/issues/23#issuecomment-45056901.

— Reply to this email directly or view it on GitHub https://github.com/Burgestrand/spotify/issues/23#issuecomment-45086381.

— Kim Burgestrand

ike18t commented 10 years ago

Whatever you changed in master appears to have worked! The jukebox played all day without crashing with the master gem and no frees. On Jun 4, 2014 9:22 AM, "Kim Burgestrand" notifications@github.com wrote:

Yes I think so. Thanks for the help!

On Wednesday, June 4, 2014, Isaac Datlof notifications@github.com wrote:

Sure dude! Pulling it now.

Quick question: Is the monkey patch you sent me a while back to get plaything to work better with the raspberry pi still required? If so, that's cool and I'll just leave it.

On Wed, Jun 4, 2014 at 2:51 AM, Kim Burgestrand < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@ike18t https://github.com/ike18t this issue was a long while ago, but I believe I have another way of tackling this problem instead of using the reaper thing that I wasn't very happy with. If you're still using your custom player, I would be very happy if you could try your code with the latest change from the master branch and see if the deadlocks have disappeared, without freeing pointers manually!

— Reply to this email directly or view it on GitHub < https://github.com/Burgestrand/spotify/issues/23#issuecomment-45056901>.

— Reply to this email directly or view it on GitHub https://github.com/Burgestrand/spotify/issues/23#issuecomment-45086381.

— Kim Burgestrand

— Reply to this email directly or view it on GitHub https://github.com/Burgestrand/spotify/issues/23#issuecomment-45089219.

Burgestrand commented 10 years ago

Awesome Happy banana!!

Burgestrand commented 10 years ago

I think I'll close this one now then. Woopee.