datatonic / duke

Automatically exported from code.google.com/p/duke
0 stars 0 forks source link

Support for multithreaded processing #35

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
We should be able to use threads to make use of all the processor cores in 
modern machines. Below is an outline of how it might be done.

one thread runs the data source and collects records from there into
a queue.

another set of threads collects records from the queue and indexes
them. it seems that multiple threads doing indexes should work.
http://darksleep.com/lucene/ once indexed the records are stuffed
into a second queue.

a pool of threads picks records from the second queue and does the
matching on them

Original issue reported on code.google.com by lar...@gmail.com on 4 Sep 2011 at 1:22

GoogleCodeExporter commented 8 years ago

Original comment by lar...@gmail.com on 4 Nov 2011 at 10:15

GoogleCodeExporter commented 8 years ago
Made an attempt in revision 533fe8427a, but it turns out that using 2-10 
threads has pretty exactly the same performance as 1. Not sure why this is so, 
but it might mean that having more Lucene searchers is not a good idea.

Need to rework the patch so that there's only one Lucene searcher active, and 
instead just run the string matching in parallel.

Also uncertain how fast the queueing and unqueueing is. It might be possible to 
gain something there. Maybe.

Original comment by lar...@gmail.com on 6 Nov 2011 at 3:27

GoogleCodeExporter commented 8 years ago
This was added a long time ago, should have closed the issue then.

Original comment by lar...@gmail.com on 30 May 2013 at 10:47