Oblynx / HierarchicalTemporalMemory.jl

A simple, high-level Julia implementation of Numenta HTM algorithms
https://oblynx.github.io/HierarchicalTemporalMemory.jl
MIT License
21 stars 3 forks source link

Compare SP/TM step performance to htm.matlab #7

Closed Oblynx closed 4 years ago

Oblynx commented 5 years ago

With the same settings for the hot gym test, comparing benchmark data from the Julia implementation with the old Matlab implementation.

Julia:

> @benchmark step!(sp,z)
BenchmarkTools.Trial: 
  memory estimate:  99.34 KiB
  allocs estimate:  72
  --------------
  minimum time:     1.552 ms (0.00% GC)
  median time:      1.675 ms (0.00% GC)
  mean time:        1.729 ms (0.64% GC)
  maximum time:     4.921 ms (64.02% GC)
  --------------
  samples:          2848
  evals/sample:     1

Matlab:

> timeit(@() sp.compute(encOut,true,1))
4.5 ms

Profiler:
- Allocated memory: 2314.67 Kb
- Peak memory: 1180.81 Kb
Oblynx commented 5 years ago

Substituting DenseSynapses for SparseSynapses, without changing the synapse permanence update calculation, the performance breakeven point is at only 1.5% sparsity!

jl> @benchmark HTMt.step!(HTMt.sp,z)
BenchmarkTools.Trial: 
  memory estimate:  91.89 KiB
  allocs estimate:  93
  --------------
  minimum time:     1.180 ms (0.00% GC)
  median time:      1.570 ms (0.00% GC)
  mean time:        1.631 ms (0.69% GC)
  maximum time:     5.337 ms (0.00% GC)
  --------------
  samples:          3019
  evals/sample:     1
jl> nnz(HTMt.sp.proximalSynapses.synapses)/length(HTMt.sp.proximalSynapses.synapses)
0.0151123046875
Oblynx commented 5 years ago

With Sparse synapses for the SP and a low synapse sparsity of 7%:

jl> @benchmark HTMt.step!(HTMt.sp,z)
BenchmarkTools.Trial: 
  memory estimate:  244.30 KiB
  allocs estimate:  4370
  --------------
  minimum time:     727.791 μs (0.00% GC)
  median time:      830.291 μs (0.00% GC)
  mean time:        892.799 μs (5.07% GC)
  maximum time:     7.528 ms (84.14% GC)
  --------------
  samples:          5446
  evals/sample:     1
Oblynx commented 5 years ago

The TM is complicated and its stepping performance depends a lot on the circumstances, such as how much new segment / synapse growth was stimulated. These ops have complexity linear to the number of synapses (insertion into SparseMatrixCSC) and dominate the performance for large numbers of synapses.

Here's a practical example, but one where very little new synapse/segment growth was triggered:

Julia TM performance:

jl> tm.distalSynapses.synapses|>size
(24576, 1994)
jl> tm.distalSynapses.synapses|>nnz
111563
jl> @benchmark step!(tm,a)
BenchmarkTools.Trial: 
  memory estimate:  2.17 MiB
  allocs estimate:  9089
  --------------
  minimum time:     786.562 μs (0.00% GC)
  median time:      910.525 μs (0.00% GC)
  mean time:        1.782 ms (47.45% GC)
  maximum time:     15.194 ms (91.08% GC)
  --------------
  samples:          2765
  evals/sample:     1
jl> tm.distalSynapses.synapses|>size
(24576, 1995)
jl> tm.distalSynapses.synapses|>nnz
111761

Matlab TM performance (test_tsprediction.m)

> timeit(@() tm.compute(spOut',true,timestep)
4.4 ms

Profiler:
- Allocated memory: 2108.05 kb
- Peak memory: 165.69 kb

This time is very close to the SP's time, showing potential shortfalls with Matlab's timing scheme.

Oblynx commented 4 years ago

not very relevant anymore