benedekh / WeLoveClouds

Repository for submissions on the CloudDB course.
0 stars 1 forks source link

feature/Sorted entries #216

Closed benedekh closed 7 years ago

benedekh commented 7 years ago

Before merge, please have a look at #217.

Measurement code:

MovablePersistentStorage storage = new MovablePersistentStorage(Paths.get("E:/alma/"));

long start = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i) {
    storage.putEntry(new KVEntry(String.valueOf(i), String.valueOf(i)));
}
long end = System.currentTimeMillis();
System.out.println("INSERT: " + (end - start));

start = System.currentTimeMillis();
System.out.println(storage.getValue(String.valueOf(500)));
end = System.currentTimeMillis();
System.out.println("GET: " + (end - start));

start = System.currentTimeMillis();
HashRange filter = new HashRange.Builder().begin(HashingUtils.getHash(String.valueOf(500)))
        .end(HashingUtils.getHash(String.valueOf(650))).build();
Set<MovableStorageUnit> filtered = storage.filterEntries(filter);
int sum = 0;
for (MovableStorageUnit unit : filtered) {
    sum += unit.entries().size();
}
System.out.println(sum);
end = System.currentTimeMillis();
System.out.println("QUERY: " + (end - start));

start = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i) {
    storage.putEntry(new KVEntry(String.valueOf(i), String.valueOf(i)));
}
end = System.currentTimeMillis();
System.out.println("UPDATE: " + (end - start));

start = System.currentTimeMillis();
for (int i = 0; i < 10000000; ++i) {
    storage.removeEntry(String.valueOf(i));
}
end = System.currentTimeMillis();
System.out.println("DELETE: " + (end - start));

Measurement results (latency):

operation avg time (ms) [unsorted keys] deviation (ms) [unsorted keys] avg time (ms) [sorted keys] deviation (ms) [sorted keys]
INSERT 8800 102.684 10948.2 855.5362
GET 0 0 0 0
QUERY 110.8 13.1225 53.2 6.300794
UPDATE 10763.6 409.147 14250.8 767.6185
DELETE 9705.6 558.1812 15469.4 827.6583

Comments:

Conclusion:

It is not really worth to keep the entries sorted, according to the (MD5) hash values of entry's key. Except from the range query (53.2 ms vs 110.8 ms) every latency was worse than in the unsorted cases. Since INSERT/UPDATE, DELETE is more common than range query (which only occurs if we want to move entries in bulk from one storage to another), their latencies have to be lower, than the QUERY's latency (tradeoff). Note: The latency for GET is not measurable in milliseconds.

benedekh commented 7 years ago

won't merge due to performance issues