Integrate a better malloc (jemalloc or tcmalloc) into fdbserver

A significant portion of our code is largely malloc-agnostic, as we never release the memory that arenas grab -- we just hang onto it for the next time an area wishes to grow. We still do have chunks of our code that uses std::map in a tight loop that would strongly benefit from having a faster system malloc.

I'm not aware of any opinions that we have about jemalloc vs tcmalloc, so whomever picks this up would need to form one. Brief work that @satherton once did showed that the custom dl_iterate_phdr implementation that we have causes jemalloc's allocation profiling to hit a deadlock in initializing malloc, as dl_iterate_phdr isn't expected to call malloc.

Both jemalloc and tcmalloc have memory profiling support that would be useful from time to time, particularly if we could integrate the arena memory allocation into it as well...

apple / foundationdb

Integrate a better malloc (jemalloc or tcmalloc) into fdbserver #619