Support Training with Metal

I've been enjoying ggml indirectly through llama.cpp, whisper.cpp, and clip.cpp on my m1 mac. I was interested in trying to train some models directly using ggml+metal, but ran into a few problems:

I tried running the mnist training example with the metal backend (code link), but ran into the following error:

mnist_graph_eval: trying to load a ggml graph from mnist-fc-f32.gguf
ggml_graph_import: invalid magic number, got 46554747
mnist_graph_eval: could not load a ggml graph from mnist-fc-f32.gguf
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1
ggml_metal_init: picking default device: Apple M1
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M1
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB
mnist_model_init_from_file: loading model weights from 'mnist-fc-f32.gguf'
mnist_model_init_from_file: model arch is mnist-fc
mnist_model_init_from_file: successfully loaded weights from mnist-fc-f32.gguf
main: loaded model in 2639.31 ms
ggml_metal_encode_node: error: unsupported op 'CROSS_ENTROPY_LOSS'
ggml/src/ggml-metal.m:899: unsupported op

I also tried calculating gradients using the simpler graph from the first example in tests/test1.c, a*x^2. When I try that, I get the following error when calling ggml_graph_reset.

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1
ggml_metal_init: picking default device: Apple M1
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name:   Apple M1
ggml_metal_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_init: simdgroup reduction support   = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory ggml/src/ggml-backend.cpp:277: GGML_ASSERT(buf->iface.memset_tensor != NULL 
&& "memset not supported by backend buffer") failed

Thanks!

ggerganov / ggml

Support Training with Metal #990