Closed matthewvon closed 7 years ago
I believe all issues are now addressed within leveldb and leveldb_ee repositories. eleveldb repository work is next.
Note: asserts() are not an error management framework. From Code Complete, First Edition: "An assertion is a function or macro that complains loudly if an assumption isn't true. use assertions to document assumptions made in the code and to flush out unexpected conditions. ... Even if you don't want your users to see assertion messages in production code, assertions are handy during development and maintenance. During development, assertions flush out contradictory assumptions, unexpected condition, bad values passed to routines, and so on."
I have added an explicit comment to the Makefile to point out assert() being disabled in production builds. Similarly I have added an explicit comment within util/prop_cache.h to point out default constructor of PropertyCache is disabled. And comment at top of util/prop_cache.cc to point out synchronization objects used to address race condition on shutdown.
+1
Looks good to me!
This is one of three related PRS:
https://github.com/basho/leveldb/pull/219 https://github.com/basho/leveldb_ee/pull/5 *https://github.com/basho/eleveldb/pull/235
The first is mainly base-class infrastructure code to handle the bucket-level extension of global expiry, the second is the implementation of the bucket-level expiry code (an enterprise-only feature, implemented in classes that inherit from the base classes in leveldb), and the third is the eleveldb interface to the new bucket-level expiry code, mainly the addition of a routing function that allows for bucket properties to be communicated to leveldb.
My biggest overall concern is about maintainability, mainly around code that stores, manipulates and compares time:
In
leveldb/include/expiry.h
there's a magic number used to signal unlimited expiry:static const uint64_t kExpiryUnlimited = ULLONG_MAX-2
. I'm not a big fan of magic numbers, particularly when they are used in a context where they're compared to other numbers of the same type which aren't magic numbers. (I realize that expiry_minutes is not likely to be intentionally set to ULLONG_MAX-2, but what if in future, expiry_minutes is changed to expiry_nanoseconds? An unrealistic example, I realize, but my point is it's a robustness issue)Then there's a lot of code that uses ExpiryTime, which is not a time-aware class but a simple
typedef to a basic type (
typedef uint64_t ExpiryTime;
), which means that the units are implicit and ambiguous.Then there's code that does time manipulation, also in implicit units that are hard to determine without comprehensive tracing of the code (and therefore easy to get wrong from place to place):
aged=Now - expiry_minutes*60*port::UINT64_ONE_SECOND
(what are the units of
aged
?Now
? Even the macros don't advertise what they are --port::UINT64_ONE_SECOND
turns out to be microseconds per second, from which we infer thataged
is meant to be microseconds. Then we have to verify thatNow
is also in microseconds...)Then there are ambiguous function names:
GetTimeMinutes()
, for example, which one might naively guess returns time in minutes, but actually returns time in microseconds from a clock that is only updated on minute-timescales.All of this leads to a codebase that is only maintainable by the original author, and is error-prone even if that author is very very careful. I think a lot of these issues could be avoided by wrapping up all time handling in a proper object-oriented way, so that you could do things like:
Other issues that touch on robustness/maintainability: in expiry_os.h/cc (and other places), there are class members that are not in any way identified as class members. I see the note about wanting to match the style of options within include/leveldb/options.h, but to my mind if you're matching a style that's problematic to begin with, you're just propagating problems deeper into the codebase. It's just another stumbling block to anyone else parsing/modifying the code.
(Likewise with the
assert()
statement in prop_cache.cc. I realize that leveldb is chock full of assert statements in lieu of an error-handling framework, so one more isn't going to change anything, but assert isn't good practice for error handling in general, and certainly not in embedded libraries, where a failing assert will simply kill the entire process, leaving riak with no ability to shut down gracefully, store state, save anything to disk, etc. Do we want to propagate that practice?)Random question about added code in
#if 0
block in sst_scan.cc: First of all, why has code been added in a commented-out block? Second, if we're leaving commented-out blocks in production code, could we add some comments as to why they've been commented out/left?In prop_cache.cc
PropertyCache::LookupInternal
considersret_handle == NULL
to be an error (i.e.,gPerfCounters->Inc(ePerfPropCacheError)
) butret_handle
also gets set to NULL ifmod_ptr->ExpiryModuleExpiry()<now
(so it's not necessarily an error). Is this right?Other random questions that I haven't been able to answer without actively testing the code more than I have:
PropertyCache::GetCache()
returns a dereferenced pointer without checking if it is non-NULL. I believe this shouldn't happen in practice, since thePropertyCache(EleveldbRouter_t)
constructor used in the code always allocates m_Cache, but without explicitly disallowing a default constructor, is there no possibility that a default constructor could get called that leaves m_Cache uninitialized? (std::vector
for example, does it all the time)Can a
PropertyCache
object be destroyed while another thread is waiting on the condvar? (what happens anyway ifpthread_cond_destroy
is called while another thread is blocked inpthread_cond_wait
?)