Closed GoogleCodeExporter closed 8 years ago
What does this actually mean? Is it several (Lucene index, record set) pairs
that you want to run in parallell? Or one Lucene index and several record sets?
If the latter, would it be enough to simply be able to start the process and
then push in records as you get them, and then have the Processor run
internally in parallel for better performance?
Original comment by lar...@gmail.com
on 7 Nov 2011 at 12:46
I'll stick to our example: I'm currently implementing a feature for importing
'friends' from other sites. We use duke to index our user database and whenever
a user imports a list of 'friends', we link them to our user base with duke's
record linkage feature. As we have thousands of users, its possible that two
users independently import 'friends'.
I'd like to have a singleton instance of duke's 'Processor' which is shared
across all threads (and therefore a single IndexReader for the lucene index).
Currently the MatchListeners are given during initialization of the Processor.
In our setting it would be necessary to give the MatchListeners as an argument
to the 'link(...)'-method.
Does this somehow fit into duke's overall design? (I could implement a proposal
today for further clarification).
Original comment by FMitzl...@googlemail.com
on 8 Nov 2011 at 5:09
Yes, this fits well into the design. I haven't worked much on making it
thread-safe yet, but on the other hand I think most of the code already is
thread-safe. If you look at the MultithreadProcessor I added on Sunday you can
see some work toward this, but that was meant to be used to speed up processing.
As far as I can see, if you modify the API so that you can pass in the
MatchListeners you should have what you need. However, I'm not sure you really
need that. Perhaps you should have a single Database instance instead, and
multiple Processor instances, since it's the Database which really represents
the Lucene index (and not the Processor), and this way you don't get into
difficulties with the MatchListeners etc.
Original comment by lar...@gmail.com
on 8 Nov 2011 at 5:23
Worked perfectly as you proposed (single Database) - thank you! I only had to
add an appropriate constructor which allows to inject a Database (pushed to my
clone).
Original comment by FMitzl...@googlemail.com
on 8 Nov 2011 at 8:37
Excellent! I'll pull the revision over to the official code ASAP.
Original comment by lar...@gmail.com
on 8 Nov 2011 at 9:00
Added and committed now. Seems to solve the problem, so I'm closing the issue.
Original comment by lar...@gmail.com
on 11 Nov 2011 at 8:35
Original issue reported on code.google.com by
FMitzl...@googlemail.com
on 7 Nov 2011 at 12:32