Comcast / sirius

A distributed system library for managing application reference data
http://comcast.github.io/sirius/
Apache License 2.0
298 stars 49 forks source link

Reduce file I/O ops when reading uberstore files #49

Closed weggert closed 5 years ago

weggert commented 10 years ago

This change updates UberStoreFileOps interface to pass in the current offset and length when reading an entry from an UberStore file rather than calling the RandomAccessFile getFilePointer() and length() methods.

It also amends the returned value to also return the nextOffset along with the entry.

Profiling in JProfiler showed that these operations were expensive.

The repeated calls are unnecessary because the offset can be accounted for with simple math, and the length of the file should only grow while while it is open for reading.

Uberstore replay time with this change is reduced 14% (1029s -> 886s) using a production Uberstore file on production-like hardware.

mbyron01 commented 10 years ago

This looks good but I want another engineer to also review this.

clinedome-work commented 10 years ago

So I'm liking this. 14% increase is huge.

The one thing this does assume, which I'm not sure is explicitly stated anywhere, is that if a file is written to after the foldLeftUntil begins, those updates will not be seen by the fold. I don't think this is necessarily wrong, but it does mean that we have to be very careful not to write to a file during replay, and we may want to add some safety code that ensures this.

As it is, writes could happen either via normal consensus (not worried here, consensus is disabled during replay) or via catchup. As the code stands now, we're actually starting up the PaxosStateBridge immediately, and allowing it to begin the catchup process during replay. We'd certainly have to modify that behavior as a part of this change (probably ok, but we should talk it out), or add some synchrony between folding and writing (probably not what we want to do).

So, :-1: for now, pending that update -- or if I'm wrong about what I've said, pending an update in my understanding. :)

CLAassistant commented 5 years ago

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Wally Eggert seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

weggert commented 5 years ago

This has been superseded by the recent change to use BufferedInputStream for read operations.