I did port some missing feature and fix some issues, I hope to be able to contribute back to main repo.
I know it's bigger pull request than anticipated(for that I am sorry), but hope to be useful to everybody.
Most this work was done following original implementation without to much refactoring to simplify comparison.
Work in PR
Removed some JVM global lock
Removed synchronized block done on class object. A synchronized block like
synchronized (MMapTable.class) {...} will use a mutex that is shared across
the entire JVM, and any other database instance would block on same mutex instance.
More concurrent friendly read/write
Previous implementation was based on full locking during read/write.
Now, only one writer can write to database at a time, but the first to enter the
write lock may help some other writers to finish at the same time it write its own data.
Read will release the lock when accessing immutable table and files to enable other
reads to enter.
Added bloom filter support
Added bloom filter policy implementation as the one implemented in original implementation.
In particular, we add a new FilterPolicy class. An instance of this class can be supplied in Options when opening a database. If supplied, the instance is used to generate summaries of keys (e.g., a bloom filter) which are placed in sstables. These summaries are consulted by DB::Get() so we can avoid reading sstable blocks that are guaranteed to not contain the key we are looking for. This change provides one implementation of FilterPolicy based on bloom filters.
The issue #90 also was corrected in this process.
Added LRU block cache to Tables
Added an LRU block cache to cache uncompressed read blocks. Existing, but
unused, Options.cacheSize can now be used to configure the cache size.
Improved abstraction over file access
Tables and log files read/write classes had no abstraction over file access, this lead to
a confusing code duplication. Interface to abstract file access where added and
duplicated classes where merged. Besides a cleaned design, it enabled easier
testing.
As memory mapped files in JVM can have there issues, Options.allowMmapWrites and
Options.allowMmapWrites where added to configure if memory mapped files
are to be used for read or/and write.
Compaction
Implement DbImpl#compactRange(byte[] begin, byte[] end) requested in by issue #30, with support
for full compaction by using compactRange(null, null).
Fix issue with VersionSet#getOverlappingInputs(int level, InternalKey begin, InternalKey end)
that could return wrong file set when used with level = 0 (ranges may overlap in level 0).
Fix issue #85 correcting Level$findFile(InternalKey) by enabling to return index
greater that last possible index.
Fix SnapshotSeekingIterator
Fix SnapshotIterator that could, in some case, return deleted entries.
A simple test to show this incorrect behavior would be:
DbStringWrapper db = new DbStringWrapper(new Options().compressionType(NONE), databaseDir);
db.put("b", "v");
db.delete("b");
db.delete("a");
assertEquals(Iterators.toString(db.iterator()), "[]");
Failing with:
java.lang.AssertionError: expected [[]] but found [[a=]]
Expected :[]
Actual :[a=]
Other notes
DbimplTest was modified to enable to run all test using distinct configuration set
and more tests where ported and created.
DbBenchmark was modified to enable concurrent benchmark and set more configurations. Added histogram support.
I did port some missing feature and fix some issues, I hope to be able to contribute back to main repo. I know it's bigger pull request than anticipated(for that I am sorry), but hope to be useful to everybody. Most this work was done following original implementation without to much refactoring to simplify comparison.
Work in PR
Removed some JVM global lock
Removed synchronized block done on class object. A synchronized block like
synchronized (MMapTable.class) {...}
will use a mutex that is shared across the entire JVM, and any other database instance would block on same mutex instance.More concurrent friendly read/write
Previous implementation was based on full locking during read/write. Now, only one writer can write to database at a time, but the first to enter the write lock may help some other writers to finish at the same time it write its own data. Read will release the lock when accessing immutable table and files to enable other reads to enter.
Added bloom filter support
Added bloom filter policy implementation as the one implemented in original implementation.
Added LRU block cache to Tables
Added an LRU block cache to cache uncompressed read blocks. Existing, but unused, Options.cacheSize can now be used to configure the cache size.
Improved abstraction over file access
Tables and log files read/write classes had no abstraction over file access, this lead to a confusing code duplication. Interface to abstract file access where added and duplicated classes where merged. Besides a cleaned design, it enabled easier testing. As memory mapped files in JVM can have there issues, Options.allowMmapWrites and Options.allowMmapWrites where added to configure if memory mapped files are to be used for read or/and write.
Compaction
Implement
DbImpl#compactRange(byte[] begin, byte[] end)
requested in by issue #30, with support for full compaction by usingcompactRange(null, null)
. Fix issue withVersionSet#getOverlappingInputs(int level, InternalKey begin, InternalKey end)
that could return wrong file set when used with level = 0 (ranges may overlap in level 0). Fix issue #85 correcting Level$findFile(InternalKey) by enabling to return index greater that last possible index.Fix SnapshotSeekingIterator
Fix SnapshotIterator that could, in some case, return deleted entries. A simple test to show this incorrect behavior would be:
Failing with:
Other notes
DbimplTest was modified to enable to run all test using distinct configuration set and more tests where ported and created.
DbBenchmark was modified to enable concurrent benchmark and set more configurations. Added histogram support.