cinchapi / concourse

Distributed database warehouse for transactions, search and analytics across time.
http://concoursedb.com
Apache License 2.0
315 stars 235 forks source link

Add option to make Record accumulation of revisions (from Segment), asynchronous #510

Open jtnelson opened 4 months ago

jtnelson commented 4 months ago

Right now, reading the revisions for a record is done synchronously. This means that operations can be slow if the relevant revisions for the record are not in memory (or the Manifest for each Segment must be loaded from disk).

This really affects search performance, especially if the search cache isn't enabled.

Add an experimental preference to ENABLE_ASYNC_DATA_READS. If it is enabled, in the getCorpusRecord, getIndexRecord, getLookupRecord, getTableRecord, etc change the logic

/**
     * Return the CorpusRecord identified by {@code key}.
     * 
     * @param key
     * @param query
     * @param toks {@code query} split by whitespace
     * @return the CorpusRecord
     */
    private CorpusRecord getCorpusRecord(Text key, Text infix) {
        masterLock.readLock().lock();
        try {
            Composite composite = Composite.create(key, infix);
            Cache<Composite, CorpusRecord> cache = ENABLE_SEARCH_CACHE
                    ? corpusCaches.computeIfAbsent(key, $ -> buildCache())
                    : DISABLED_CORPUS_CACHE;
            return cache.get(composite, () -> {
                CorpusRecord $ = CorpusRecord.createPartial(key, infix);
                for (Segment segment : segments) {
                    segment.corpus().seek(composite, $);
                }
                return $;
            });
        }
        catch (ExecutionException e) {
            throw CheckedExceptions.wrapAsRuntimeException(e);
        }
        finally {
            masterLock.readLock().unlock();
        }
    }

would transform into something like

AtomicInteger count = new AtomicInteger();
List<SkinnyRecord> records // come up with a better name. This should be a special Record type that does not do any locking and overrides the append() method to just add revisions to a list. Can't use AmnesiaRecord because still need to keep track of the history
List<CompletableFuture> futures
for(Segment segment: segments) {
  int c = count.getAndIncrmenet();
  SkinnyRecord $ = 
  records.add(c, $) // ensures that the revisions get sorted in the correct order eventually
  futures.add(CompletableFuture.runAsync(() -> segment.corpus().seek(composite, $)))
}
CompletableFuture.allOf(futures).get() // gotta make it an array
return CorpusRecord.merge(records) // create new factories to merge a bunch of records in order into one record that has all the revisions. See what we can do about locking so that write locks aren't grabbed when writing

// Another options is to gather each Runnable in a list and use AwaitableExecutorService.await()

Run some tests (particular with search without cahce) to see if performance is faster...especially reading from disk

jtnelson commented 4 months ago

Will need to keep in mind how the record caching works. So will probably need to create the master Record that will be added to the cache and the accumulate some smaller records and then do a

record.merge(records...) that will go through each of the records in order and append all their revisions.