Device-specific memory allocation and staging

Optimum choice of memory allocations on the GPU depends on several different factors, including:

GPU/CPU memory architecture, like how integrated GPUs usually have a shared memory model, so all memory is usually GPU accessible and CPU accessible at the same time, and discrete GPU setups typically have their own dedicated memory with much faster access. And a software renderer has only CPU memory.
frequency of update from host - is this is a still image or vertex data that is going to be loaded from disk, decoded, and then never altered? Or, at the other extreme, is this audio data that is going to be different every frame?
frequency of access by the device - many times per render, like in an image sampler? Once per draw call like a vertex uniform?

There are potentially big performance differences between architectures and access patterns. And, any GPU-only memory will need to be staged, meaning a temporary CPU-accessible buffer is created and then graphics commands issued to copy the data to the GPU-only buffer. This staging needs to happen before the buffer can be used, and in order to avoid race conditions it seems it would be best if the staging happens before creation callbacks are processed. For example, staging the vertex and index buffers before a ScinthDef completion callback or OSC message gets sent.

There might also be opportunity for re-use of vertex and index buffers here, too, which are not currently shared across ScinthDefs.

Lastly, this might help with #52, as it is true that one of the things that's different between the integrated and discrete GPU is memory model.

ScintillatorSynth / Scintillator

Device-specific memory allocation and staging #58