Entries in the persisted storage units, and in the PersistentStorage are sorted according to the Hash value of the entry's key.
fixes #214
Measurement code:
MovablePersistentStorage storage = new MovablePersistentStorage(Paths.get("E:/alma/"));
long start = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i) {
storage.putEntry(new KVEntry(String.valueOf(i), String.valueOf(i)));
}
long end = System.currentTimeMillis();
System.out.println("INSERT: " + (end - start));
start = System.currentTimeMillis();
System.out.println(storage.getValue(String.valueOf(500)));
end = System.currentTimeMillis();
System.out.println("GET: " + (end - start));
start = System.currentTimeMillis();
HashRange filter = new HashRange.Builder().begin(HashingUtils.getHash(String.valueOf(500)))
.end(HashingUtils.getHash(String.valueOf(650))).build();
Set<MovableStorageUnit> filtered = storage.filterEntries(filter);
int sum = 0;
for (MovableStorageUnit unit : filtered) {
sum += unit.entries().size();
}
System.out.println(sum);
end = System.currentTimeMillis();
System.out.println("QUERY: " + (end - start));
start = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i) {
storage.putEntry(new KVEntry(String.valueOf(i), String.valueOf(i)));
}
end = System.currentTimeMillis();
System.out.println("UPDATE: " + (end - start));
start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i) {
storage.removeEntry(String.valueOf(i));
}
end = System.currentTimeMillis();
System.out.println("DELETE: " + (end - start));
Measurement results (latency):
operation
avg time (ms) [unsorted keys]
deviation (ms) [unsorted keys]
avg time (ms) [sorted keys]
deviation (ms) [sorted keys]
INSERT
8800
102.684
10948.2
855.5362
GET
0
0
0
0
QUERY
110.8
13.1225
53.2
6.300794
UPDATE
10763.6
409.147
14250.8
767.6185
DELETE
9705.6
558.1812
15469.4
827.6583
Comments:
Insert, update, delete: 10k entries
Get: only for one value
Query: 7325 number of entries were retrieved
Conclusion:
It is not really worth to keep the entries sorted, according to the (MD5) hash values of entry's key. Except from the range query (53.2 ms vs 110.8 ms) every latency was worse than in the unsorted cases. Since INSERT/UPDATE, DELETE is more common than range query (which only occurs if we want to move entries in bulk from one storage to another), their latencies have to be lower, than the QUERY's latency (tradeoff).
Note: The latency for GET is not measurable in milliseconds.
Before merge, please have a look at #217.
Measurement code:
Measurement results (latency):
Comments:
Conclusion:
It is not really worth to keep the entries sorted, according to the (MD5) hash values of entry's key. Except from the range query (53.2 ms vs 110.8 ms) every latency was worse than in the unsorted cases. Since INSERT/UPDATE, DELETE is more common than range query (which only occurs if we want to move entries in bulk from one storage to another), their latencies have to be lower, than the QUERY's latency (tradeoff). Note: The latency for GET is not measurable in milliseconds.