ankane / faiss-ruby

Efficient similarity search and clustering for Ruby
MIT License
128 stars 5 forks source link

[WIP] GVL release #7

Open atesgoral opened 2 weeks ago

atesgoral commented 2 weeks ago

Release GVL while performing CPU-intensive operations to benefit multi-threaded Ruby applications.

ankane commented 1 week ago

Hi @atesgoral, thanks for the PR! However, this doesn't seem safe to do without more locking (as add can now be called while search is running). I'm not sure I'd like to maintain the additional complexity, but happy to take another look if you decide to implement it.

https://github.com/facebookresearch/faiss/wiki/Threads-and-asynchronous-calls

tenderlove commented 1 week ago

@ankane are you sure that's true? According to the link you sent:

Python interface releases the Global Interpreter Lock for all calls, so using python multithreading will effectively use several cores.

I'm not familiar with the Python extension, but it seems like we should be in the same boat? Am I missing something?

ankane commented 1 week ago

Will check out the Python code, but this causes the test to consume all available CPU and hang (which doesn't happen when the GVL is enabled).

--- a/test/index_test.rb
+++ b/test/index_test.rb
@@ -279,17 +279,18 @@ class IndexTest < Minitest::Test
       [1, 1, 2, 1],
       [5, 4, 6, 5],
       [1, 2, 1, 2]
-    ]
+    ] * 100
     index = Faiss::IndexFlatL2.new(4)
     index.add(objects)

     concurrency = 0
     max_concurrency = 0

-    threads = 2.times.map {
+    threads = 100.times.map {
       Thread.new {
         concurrency += 1
         max_concurrency = [max_concurrency, concurrency].max
+        index.add(objects)
         index.search(objects, 3)
         concurrency -= 1
       }
atesgoral commented 1 week ago

@ankane Thanks for having a look.

I didn't need more data or 100 iters to get it to lock. I haven't looked at faiss in detail yet but I see the word "lock" all over the repo. This is enough to cause a deadlock(?):

diff --git a/test/index_test.rb b/test/index_test.rb
index 64e093b..095b8ea 100644
--- a/test/index_test.rb
+++ b/test/index_test.rb
@@ -290,7 +290,13 @@ class IndexTest < Minitest::Test
       Thread.new {
         concurrency += 1
         max_concurrency = [max_concurrency, concurrency].max
+        puts "adding"
+        index.add(objects)
+        puts "added"
+        sleep(10)
+        puts "searching"
         index.search(objects, 3)
+        puts "searched"
         concurrency -= 1
       }
     }

I say "deadlock(?)" because it doesn't act like an forever deadlock, but something gives eventually and the tests ends with a success. 🤔

obie commented 1 week ago

keeping 👀 on this