emer / axon

Axon is a spiking, biologically-based neural model driven by predictive error-driven learning, for systems-level models of the brain
BSD 3-Clause "New" or "Revised" License
20 stars 7 forks source link

float16 support for synaptic variables on the GPU (and CPU) #233

Open rcoreilly opened 1 year ago

rcoreilly commented 1 year ago

It would probably be useful to use float16 for the synaptic variables. There is a nice existing Go package: https://github.com/x448/float16

vulkan 1.2 now includes it as a fully supported option: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_shader_float16_int8.html and in HLSL it is float16_t fully supported with shader model 6.2 and moltenvk appears to support it as of 2018: https://github.com/KhronosGroup/MoltenVK/issues/368

some older GPU hardware does not have native 16 bit support, so it will run much slower there, but the advantages on current hardware likely outweighs that.

rcoreilly commented 1 year ago

Actual Lvis model tests show that the A100 does not allow addressing of more than 31 bits of memory even if broken up using a SynMemBlock:

struct SynMemBlock {
    float vals[64];
};

float SynV(in Context ctx, uint syni, SynapseVars svar) {
    uint64 ix = ctx.SynapseVars.Idx(syni, svar);
    return Synapses[ uint(ix / 64)].vals[uint(ix % 64)];
}

For bench_lvis net, with ndata=2, we're under, but ndata=3 puts over to SynCa = 2.5 GB

  BenchLvisNet:  Neurons: 47204  NeurMem: 30.6 MB    Syns: 32448512      SynIdxs: 371.3 MB   SynWts: 618.9 MB    SynCa: 1.7 GB

So the next strategy is to use pages of memory instead of blocks, for SynCa which is the main point of failure.

And float16 will help relieve pressure considerably!

rcoreilly commented 1 year ago

this looks like a great resource: https://therealmjp.github.io/posts/shader-fp16/