djanloo / quilt

A multiscale neural simulator
MIT License
0 stars 0 forks source link

Multithreading #9

Open djanloo opened 7 months ago

djanloo commented 7 months ago
djanloo commented 6 months ago
djanloo commented 1 month ago

thread creation and redundant binding (#21) turned out to be relevant in performance: evolution of the neuron costs nearly as much as handling incoming spikes. This is awkward and may be due to this two facts.

Approach:

Benchmark and remember to check if the neuron becomes too heavy in memory size.

djanloo commented 1 month ago

After https://github.com/djanloo/quilt/commit/aeececbd8348a55c54bbd5ff45a41b5acc5058f0 :

Running network consisting of 14622 neurons for 6000 timesteps
--------------------------------------------------
**************************************************
[2024-07-12 12:10:37] - PID 138794852168832 - INFO: Ouput for PerformanceRegistrar (8 managers )
[2024-07-12 12:10:37] - PID 138794852168832 - INFO: Output for PerformanceManager <spiking network>
_________inject _____1.0 s | 238.0 us/step for 6000 steps
_____monitorize ___20.0 us | __3.0 ns/step for 6000 steps
_____simulation ____11.0 s

[2024-07-12 12:10:37] - PID 138794852168832 - INFO: Output for PerformanceManager <Population 0>
______evolution _____1.0 s | 330.0 us/step for 6000 steps | _55.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __297.0 ms | _49.0 us/step for 6000 steps | __8.0 ns/step/unit for 6000 steps and 6000 units
_spike_handling _____2.0 s | 335.0 us/step for 6000 steps | _55.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-12 12:10:37] - PID 138794852168832 - INFO: Output for PerformanceManager <Population 1>
______evolution _____2.0 s | 336.0 us/step for 6000 steps | _56.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __240.0 ms | _40.0 us/step for 6000 steps | __6.0 ns/step/unit for 6000 steps and 6000 units
_spike_handling _____2.0 s | 334.0 us/step for 6000 steps | _55.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-12 12:10:37] - PID 138794852168832 - INFO: Output for PerformanceManager <Population 2>
______evolution __139.0 ms | _23.0 us/step for 6000 steps | _55.0 ns/step/unit for 6000 steps and 420 units
_spike_emission ___11.0 ms | __1.0 us/step for 6000 steps | __4.0 ns/step/unit for 6000 steps and 420 units
_spike_handling __133.0 ms | _22.0 us/step for 6000 steps | _52.0 ns/step/unit for 6000 steps and 420 units

[2024-07-12 12:10:37] - PID 138794852168832 - INFO: Output for PerformanceManager <Population 3>
______evolution __261.0 ms | _43.0 us/step for 6000 steps | _55.0 ns/step/unit for 6000 steps and 780 units
_spike_emission ___20.0 ms | __3.0 us/step for 6000 steps | __4.0 ns/step/unit for 6000 steps and 780 units
_spike_handling __258.0 ms | _43.0 us/step for 6000 steps | _55.0 ns/step/unit for 6000 steps and 780 units

[2024-07-12 12:10:37] - PID 138794852168832 - INFO: Output for PerformanceManager <Population 4>
______evolution ___84.0 ms | _14.0 us/step for 6000 steps | _54.0 ns/step/unit for 6000 steps and 260 units
_spike_emission ____4.0 ms | 813.0 ns/step for 6000 steps | __3.0 ns/step/unit for 6000 steps and 260 units
_spike_handling ___81.0 ms | _13.0 us/step for 6000 steps | _51.0 ns/step/unit for 6000 steps and 260 units

[2024-07-12 12:10:37] - PID 138794852168832 - INFO: Output for PerformanceManager <Population 5>
______evolution __137.0 ms | _22.0 us/step for 6000 steps | _56.0 ns/step/unit for 6000 steps and 408 units
_spike_emission ____8.0 ms | __1.0 us/step for 6000 steps | __3.0 ns/step/unit for 6000 steps and 408 units
_spike_handling __134.0 ms | _22.0 us/step for 6000 steps | _54.0 ns/step/unit for 6000 steps and 408 units

[2024-07-12 12:10:37] - PID 138794852168832 - INFO: Output for PerformanceManager <Population 6>
______evolution __261.0 ms | _43.0 us/step for 6000 steps | _57.0 ns/step/unit for 6000 steps and 754 units
_spike_emission ___16.0 ms | __2.0 us/step for 6000 steps | __3.0 ns/step/unit for 6000 steps and 754 units
_spike_handling __259.0 ms | _43.0 us/step for 6000 steps | _57.0 ns/step/unit for 6000 steps and 754 units

that is awkward: times are smaller but the overall simulation takes longer

EDIT: spike were not processed

djanloo commented 1 month ago

Performances for https://github.com/djanloo/quilt/commit/e71904f2ba63053a0c84013412f831e3fe4e73e5 :

Running network consisting of 14622 neurons for 6000 timesteps
--------------------------------------------------
**************************************************
[2024-07-12 14:10:57] - PID 139656515130496 - INFO: Ouput for PerformanceRegistrar (8 managers )
[2024-07-12 14:10:57] - PID 139656515130496 - INFO: Output for PerformanceManager <spiking network>
_________inject __746.0 ms | 124.0 us/step for 6000 steps
_____monitorize ____2.0 ms | 378.0 ns/step for 6000 steps
_____simulation ____16.0 s

[2024-07-12 14:10:57] - PID 139656515130496 - INFO: Output for PerformanceManager <Population 0>
______evolution _____5.0 s | 847.0 us/step for 6000 steps | 141.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __124.0 ms | _20.0 us/step for 6000 steps | __3.0 ns/step/unit for 6000 steps and 6000 units
_spike_handling __677.0 ms | 112.0 us/step for 6000 steps | _18.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-12 14:10:57] - PID 139656515130496 - INFO: Output for PerformanceManager <Population 1>
______evolution _____4.0 s | 826.0 us/step for 6000 steps | 137.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __123.0 ms | _20.0 us/step for 6000 steps | __3.0 ns/step/unit for 6000 steps and 6000 units
_spike_handling __612.0 ms | 102.0 us/step for 6000 steps | _17.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-12 14:10:57] - PID 139656515130496 - INFO: Output for PerformanceManager <Population 2>
______evolution __507.0 ms | _84.0 us/step for 6000 steps | 201.0 ns/step/unit for 6000 steps and 420 units
_spike_emission ___35.0 ms | __5.0 us/step for 6000 steps | _13.0 ns/step/unit for 6000 steps and 420 units
_spike_handling ___48.0 ms | __8.0 us/step for 6000 steps | _19.0 ns/step/unit for 6000 steps and 420 units

[2024-07-12 14:10:57] - PID 139656515130496 - INFO: Output for PerformanceManager <Population 3>
______evolution __913.0 ms | 152.0 us/step for 6000 steps | 195.0 ns/step/unit for 6000 steps and 780 units
_spike_emission ___77.0 ms | _12.0 us/step for 6000 steps | _16.0 ns/step/unit for 6000 steps and 780 units
_spike_handling __121.0 ms | _20.0 us/step for 6000 steps | _25.0 ns/step/unit for 6000 steps and 780 units

[2024-07-12 14:10:57] - PID 139656515130496 - INFO: Output for PerformanceManager <Population 4>
______evolution __303.0 ms | _50.0 us/step for 6000 steps | 194.0 ns/step/unit for 6000 steps and 260 units
_spike_emission ____6.0 ms | __1.0 us/step for 6000 steps | __3.0 ns/step/unit for 6000 steps and 260 units
_spike_handling ___26.0 ms | __4.0 us/step for 6000 steps | _16.0 ns/step/unit for 6000 steps and 260 units

[2024-07-12 14:10:57] - PID 139656515130496 - INFO: Output for PerformanceManager <Population 5>
______evolution __479.0 ms | _79.0 us/step for 6000 steps | 196.0 ns/step/unit for 6000 steps and 408 units
_spike_emission ____8.0 ms | __1.0 us/step for 6000 steps | __3.0 ns/step/unit for 6000 steps and 408 units
_spike_handling ___43.0 ms | __7.0 us/step for 6000 steps | _17.0 ns/step/unit for 6000 steps and 408 units

[2024-07-12 14:10:57] - PID 139656515130496 - INFO: Output for PerformanceManager <Population 6>
______evolution __886.0 ms | 147.0 us/step for 6000 steps | 195.0 ns/step/unit for 6000 steps and 754 units
_spike_emission __339.0 us | _56.0 ns/step for 6000 steps | __0.0 ns/step/unit for 6000 steps and 754 units
_spike_handling __185.0 ms | _30.0 us/step for 6000 steps | _41.0 ns/step/unit for 6000 steps and 754 units

This awful performance is explained here :

In GCC 9, there was a hard dependency to TBB when using the different executions policies, if that were not present then the build would fail. That changed in GCC 10 (and present in GCC 11), where if the library was not present then the for_each would default to a sequential loop. This can be seen at https://github.com/gcc-mirror/gcc/blob/releases/gcc-10.1.0/libstdc++-v3/include/bits/c++config#L679.

djanloo commented 1 month ago

Performances for https://github.com/djanloo/quilt/commit/b1b97f36ef4049bfcdab51be337d34bc80647930 :

Running network consisting of 14622 neurons for 6000 timesteps
--------------------------------------------------
**************************************************
[2024-07-12 16:42:35] - PID 126648389575808 - INFO: Ouput for PerformanceRegistrar (8 managers )
[2024-07-12 16:42:35] - PID 126648389575808 - INFO: Output for PerformanceManager <spiking network>
_________inject __773.0 ms | 128.0 us/step for 6000 steps
_____monitorize __424.0 us | _70.0 ns/step for 6000 steps
_____simulation _____6.0 s

[2024-07-12 16:42:35] - PID 126648389575808 - INFO: Output for PerformanceManager <Population 0>
______evolution _____1.0 s | 249.0 us/step for 6000 steps | _41.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __160.0 ms | _26.0 us/step for 6000 steps | __4.0 ns/step/unit for 6000 steps and 6000 units
_spike_handling __239.0 ms | _39.0 us/step for 6000 steps | __6.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-12 16:42:35] - PID 126648389575808 - INFO: Output for PerformanceManager <Population 1>
______evolution _____1.0 s | 249.0 us/step for 6000 steps | _41.0 ns/step/unit for 6000 steps and 6000 units
_spike_emission __191.0 ms | _31.0 us/step for 6000 steps | __5.0 ns/step/unit for 6000 steps and 6000 units
_spike_handling __232.0 ms | _38.0 us/step for 6000 steps | __6.0 ns/step/unit for 6000 steps and 6000 units

[2024-07-12 16:42:35] - PID 126648389575808 - INFO: Output for PerformanceManager <Population 2>
______evolution __213.0 ms | _35.0 us/step for 6000 steps | _84.0 ns/step/unit for 6000 steps and 420 units
_spike_emission ___49.0 ms | __8.0 us/step for 6000 steps | _19.0 ns/step/unit for 6000 steps and 420 units
_spike_handling ___91.0 ms | _15.0 us/step for 6000 steps | _36.0 ns/step/unit for 6000 steps and 420 units

[2024-07-12 16:42:35] - PID 126648389575808 - INFO: Output for PerformanceManager <Population 3>
______evolution __309.0 ms | _51.0 us/step for 6000 steps | _66.0 ns/step/unit for 6000 steps and 780 units
_spike_emission ___93.0 ms | _15.0 us/step for 6000 steps | _19.0 ns/step/unit for 6000 steps and 780 units
_spike_handling ___95.0 ms | _15.0 us/step for 6000 steps | _20.0 ns/step/unit for 6000 steps and 780 units

[2024-07-12 16:42:35] - PID 126648389575808 - INFO: Output for PerformanceManager <Population 4>
______evolution __143.0 ms | _23.0 us/step for 6000 steps | _92.0 ns/step/unit for 6000 steps and 260 units
_spike_emission ___10.0 ms | __1.0 us/step for 6000 steps | __6.0 ns/step/unit for 6000 steps and 260 units
_spike_handling ___77.0 ms | _12.0 us/step for 6000 steps | _49.0 ns/step/unit for 6000 steps and 260 units

[2024-07-12 16:42:35] - PID 126648389575808 - INFO: Output for PerformanceManager <Population 5>
______evolution __189.0 ms | _31.0 us/step for 6000 steps | _77.0 ns/step/unit for 6000 steps and 408 units
_spike_emission ___17.0 ms | __2.0 us/step for 6000 steps | __7.0 ns/step/unit for 6000 steps and 408 units
_spike_handling ___78.0 ms | _13.0 us/step for 6000 steps | _31.0 ns/step/unit for 6000 steps and 408 units

[2024-07-12 16:42:35] - PID 126648389575808 - INFO: Output for PerformanceManager <Population 6>
______evolution __302.0 ms | _50.0 us/step for 6000 steps | _66.0 ns/step/unit for 6000 steps and 754 units
_spike_emission ___14.0 ms | __2.0 us/step for 6000 steps | __3.0 ns/step/unit for 6000 steps and 754 units
_spike_handling __110.0 ms | _18.0 us/step for 6000 steps | _24.0 ns/step/unit for 6000 steps and 754 units

this seems good. Odd thing, performance depends on which terminal I use