GlobalAlloc, AllocRef implementation and optimized versions of memcpy, etc.

This PR implements global memory allocators. Allowing use of the heap, e.g. Box, Vec, etc. etc. It has seperate IRAM, DRAM and External RAM allocators and a merged/mixed allocator.

It builds further on #16. Best to merge that one first.

Features

GlobalAlloc and AllocRef support
Supports DRAM, IRAM and External RAM
A default allocator which chooses between DAM, IRAM and External RAM bases on allocation size and alignment
Separate allocators to force a certain type of memory
Optimized memcpy, memset, memcmp, etc. function: original intent was to implement versions which properly adhere to alignment to allow IRAM usage. In the course of implementation I also optimized them heavily.

memcpy, etc implementations

These have been heavily optimized for esp32. For large aligned memset and memcpy reaches ~88% of maximum memory bandwidth; for memcpy_reverse ~60%.

Discussion

I chose to implement the memcpy etc functions in the esp32-hal branch. The details of the optimizations are esp32 specific. So option 3 below, but then in esp32-hal. Could in principle be moved to xtensa-lx-rt crate. Open for discussion.

Options considered ~~1. don't use IRAM for heap at all~~ ~~2. create improved versions of memcpy, etc and get this integrated into compiler-builtins crate~~

create improved versions of memcpy etc. and add it to the xtensa-lx6-rt crate using the [package.metadata.cargo-xbuild] memcpy = true setting in Cargo.toml ~~4. add optimized memcpy, memfill support to the xtensa llvm branch, avoiding the rust implementations (as seems to be done on x86 etc.)~~

esp-rs / esp32-hal