br1ghtyang / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Returning duplicate answer when there is a concurrent flush #575

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
This incorrect scenario is possible in the current codebase:
A flush of an in-memory component is underway. A new searcher is about to 
start. The searcher will try to grab the components that it need to search. It 
is possible that searcher will grab the in-memory component, and before 
grabbing the disk components, the flusher has finished and marked the component 
as valid. Therefore, the searcher will also grab the flushed component again as 
a disk component. This will cause the searcher to see the content of the 
in-memory component (that was just flushed) twice. 
The current LSM-BTree search cursor naturally prevents seeing a duplicate key, 
but that is just specific implementation for the LSM-BTree. If there are 
secondary indexes such as the R-tree or inverted index, then it will return 
duplicate answers.

You can verify this behavior by inserting a sleep statement 
(Thread.sleep(100000)) in line 198 in LSMHarness.flush() (after 
lsmIndex.addComponent(newComponent);)
and then when the flush has started, submit search queries.

Original issue reported on code.google.com by salsuba...@gmail.com on 24 Jul 2013 at 6:22

GoogleCodeExporter commented 8 years ago

Original comment by salsuba...@gmail.com on 27 Jul 2013 at 8:16

GoogleCodeExporter commented 8 years ago

Original comment by salsuba...@gmail.com on 23 Aug 2013 at 5:33