Closed Oblynx closed 4 years ago
Substituting DenseSynapses
for SparseSynapses
, without changing the synapse permanence update calculation, the performance breakeven point is at only 1.5% sparsity!
jl> @benchmark HTMt.step!(HTMt.sp,z)
BenchmarkTools.Trial:
memory estimate: 91.89 KiB
allocs estimate: 93
--------------
minimum time: 1.180 ms (0.00% GC)
median time: 1.570 ms (0.00% GC)
mean time: 1.631 ms (0.69% GC)
maximum time: 5.337 ms (0.00% GC)
--------------
samples: 3019
evals/sample: 1
jl> nnz(HTMt.sp.proximalSynapses.synapses)/length(HTMt.sp.proximalSynapses.synapses)
0.0151123046875
With Sparse synapses for the SP and a low synapse sparsity of 7%:
jl> @benchmark HTMt.step!(HTMt.sp,z)
BenchmarkTools.Trial:
memory estimate: 244.30 KiB
allocs estimate: 4370
--------------
minimum time: 727.791 μs (0.00% GC)
median time: 830.291 μs (0.00% GC)
mean time: 892.799 μs (5.07% GC)
maximum time: 7.528 ms (84.14% GC)
--------------
samples: 5446
evals/sample: 1
The TM is complicated and its stepping performance depends a lot on the circumstances, such as how much new segment / synapse growth was stimulated. These ops have complexity linear to the number of synapses (insertion into SparseMatrixCSC) and dominate the performance for large numbers of synapses.
Here's a practical example, but one where very little new synapse/segment growth was triggered:
jl> tm.distalSynapses.synapses|>size
(24576, 1994)
jl> tm.distalSynapses.synapses|>nnz
111563
jl> @benchmark step!(tm,a)
BenchmarkTools.Trial:
memory estimate: 2.17 MiB
allocs estimate: 9089
--------------
minimum time: 786.562 μs (0.00% GC)
median time: 910.525 μs (0.00% GC)
mean time: 1.782 ms (47.45% GC)
maximum time: 15.194 ms (91.08% GC)
--------------
samples: 2765
evals/sample: 1
jl> tm.distalSynapses.synapses|>size
(24576, 1995)
jl> tm.distalSynapses.synapses|>nnz
111761
> timeit(@() tm.compute(spOut',true,timestep)
4.4 ms
Profiler:
- Allocated memory: 2108.05 kb
- Peak memory: 165.69 kb
This time is very close to the SP's time, showing potential shortfalls with Matlab's timing scheme.
not very relevant anymore
With the same settings for the hot gym test, comparing benchmark data from the Julia implementation with the old Matlab implementation.
Julia:
Matlab: