Should allow creating and destroying in different threads, because allocation is a global state and must therefore act as a server. The simplest way would be a global mutex.
Should be easy to turn off using a macro, in case that memory recycling hides a leak. It can also have its own tools for reporting allocation statistics.
Should at least work with string content and image heads.
Fixed-size heads can try to find their allocation bin in compile-time using a constexpr function of the size. Dynamic allocations have to find their bin at run-time.
Plain non-virtual structs can safely share bins with similar size types, so simplify as many virtual classes as possible to keep more active memory within cache.
A custom implementation of reference counted pointers for handles might make it easier to control allocation and recycling. Custom reference counted pointers could also allow making a C interface for core parts of the library, but it would still rely on compiling C++.