allow frontend to register a custom allocation function for tapes
Needs some thought on CUDA device
Address space might differ from normal malloc, the goal is to turn tapes into full Julia objects.
Frontend should also be able to provide a custom free or indicate that a free is not needed.
Stage 2:
Besides the tape size, also provide a runtime layout descriptor. This is needed for GC support so that Enzyme.jl can find sub-tapes and Julia objects stored on the tape.
We might need support for emitting write barriers, e.g. when we store a Julia object to the tape we will have to insert a call to an intrinsic
Capturing an offline discussion with @wsmoses
Stage 1:
Stage 2: