calumroy / HTM

HTM
3 stars 0 forks source link

np_temporal profiling speed #27

Open calumroy opened 7 years ago

calumroy commented 7 years ago

The numpy temporal calculator is slow. Here is the profiling for a the top htm layer performing temporal poooling with the following configuration parameters for the HTM network;

testParameters = {
                  'HTM': {
                        'numLevels': 1,
                        'columnArrayWidth': 11,
                        'columnArrayHeight': 31,
                        'cellsPerColumn': 3,

                        'HTMRegions': [{
                            'numLayers': 3,
                            'enableHigherLevFb': 0,
                            'enableCommandFeedback': 0,

                            'HTMLayers': [{
                                'desiredLocalActivity': 1,
                                'minOverlap': 3,
                                'wrapInput':0,
                                'inhibitionWidth': 4,
                                'inhibitionHeight': 2,
                                'centerPotSynapses': 1,
                                'potentialWidth': 5,
                                'potentialHeight': 5,
                                'spatialPermanenceInc': 0.1,
                                'spatialPermanenceDec': 0.02,
                                'activeColPermanenceDec': 0.02,
                                'tempDelayLength': 3,
                                'permanenceInc': 0.1,
                                'permanenceDec': 0.02,
                                'tempSpatialPermanenceInc': 0,
                                'tempSeqPermanenceInc': 0,
                                'connectPermanence': 0.3,
                                'minThreshold': 5,
                                'minScoreThreshold': 5,
                                'newSynapseCount': 10,
                                'maxNumSegments': 10,
                                'activationThreshold': 6,
                                'colSynPermanence': 0.1,
                                'cellSynPermanence': 0.4
                                },
                                {
                                'desiredLocalActivity': 1,
                                'minOverlap': 2,
                                'wrapInput':0,
                                'inhibitionWidth': 8,
                                'inhibitionHeight': 4,
                                'centerPotSynapses': 1,
                                'potentialWidth': 7,
                                'potentialHeight': 7,
                                'spatialPermanenceInc': 0.2,
                                'spatialPermanenceDec': 0.02,
                                'activeColPermanenceDec': 0.02,
                                'tempDelayLength': 3,
                                'permanenceInc': 0.1,
                                'permanenceDec': 0.02,
                                'tempSpatialPermanenceInc': 0.2,
                                'tempSeqPermanenceInc': 0.1,
                                'connectPermanence': 0.3,
                                'minThreshold': 5,
                                'minScoreThreshold': 3,
                                'newSynapseCount': 10,
                                'maxNumSegments': 10,
                                'activationThreshold': 6,
                                'colSynPermanence': 0.1,
                                'cellSynPermanence': 0.4
                                },
                                {
                                'desiredLocalActivity': 1,
                                'minOverlap': 2,
                                'wrapInput':1,
                                'inhibitionWidth': 30,
                                'inhibitionHeight': 2,
                                'centerPotSynapses': 1,
                                'connectPermanence': 0.3,
                                'potentialWidth': 34,
                                'potentialHeight': 31,
                                'spatialPermanenceInc': 0.1,
                                'spatialPermanenceDec': 0.01,
                                'activeColPermanenceDec': 0.0,
                                'tempDelayLength': 10,
                                'permanenceInc': 0.15,
                                'permanenceDec': 0.05,
                                'tempSpatialPermanenceInc': 0.04,
                                'tempSeqPermanenceInc': 0.1,
                                'minThreshold': 5,
                                'minScoreThreshold': 3,
                                'newSynapseCount': 10,
                                'maxNumSegments': 10,
                                'activationThreshold': 6,
                                'colSynPermanence': 0.1,
                                'cellSynPermanence': 0.4
                                }]
                            }]
                        }
                    }

The profiling of one step of this layer is shown below.

 142438 function calls in 0.169 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.169    0.169 HTM_network.py:1002(spatialTemporal)
        1    0.000    0.000    0.000    0.000 HTM_network.py:1105(updateHTMInput)
        1    0.000    0.000    0.169    0.169 HTM_network.py:1173(spatialTemporal)
        3    0.000    0.000    0.000    0.000 HTM_network.py:599(getPotentialOverlaps)
        3    0.000    0.000    0.000    0.000 HTM_network.py:662(updateInput)
        3    0.000    0.000    0.000    0.000 HTM_network.py:673(updateOutput)
        3    0.000    0.000    0.015    0.005 HTM_network.py:716(Overlap)
        3    0.000    0.000    0.018    0.006 HTM_network.py:738(inhibition)
        3    0.000    0.000    0.010    0.003 HTM_network.py:754(spatialLearning)
        3    0.000    0.000    0.031    0.010 HTM_network.py:765(sequencePooler)
        3    0.000    0.000    0.011    0.004 HTM_network.py:777(calcActiveCells)
        3    0.000    0.000    0.012    0.004 HTM_network.py:797(calcPredictCells)
        3    0.000    0.000    0.008    0.003 HTM_network.py:809(sequenceLearning)
        3    0.000    0.000    0.095    0.032 HTM_network.py:823(temporalPooler)
        1    0.000    0.000    0.000    0.000 HTM_network.py:951(updateRegionInput)
       20    0.000    0.000    0.000    0.000 arraypad.py:101(<genexpr>)
       20    0.000    0.000    0.000    0.000 arraypad.py:1069(<genexpr>)
        2    0.000    0.000    0.000    0.000 arraypad.py:1072(_validate_lengths)
        8    0.000    0.000    0.000    0.000 arraypad.py:111(_append_const)
        2    0.000    0.000    0.000    0.000 arraypad.py:1117(pad)
       20    0.000    0.000    0.000    0.000 arraypad.py:135(<genexpr>)
        8    0.000    0.000    0.000    0.000 arraypad.py:77(_prepend_const)
        4    0.000    0.000    0.000    0.000 arraypad.py:989(_normalize_shape)
        3    0.000    0.000    0.000    0.000 basic.py:4352(perform)
       15    0.000    0.000    0.002    0.000 cc.py:1525(__call__)
        2    0.000    0.000    0.000    0.000 fromnumeric.py:2767(round_)
        8    0.000    0.000    0.000    0.000 fromnumeric.py:43(_wrapit)
        8    0.000    0.000    0.000    0.000 fromnumeric.py:823(argsort)
       36    0.024    0.001    0.036    0.001 function_module.py:482(__call__)
        3    0.000    0.000    0.000    0.000 function_module.py:691(free)
        3    0.000    0.000    0.000    0.000 link.py:324(__get__)
        3    0.000    0.000    0.000    0.000 np_activeCells.py:213(getCurrentLearnCellsList)
        3    0.000    0.000    0.000    0.000 np_activeCells.py:221(getActiveCellsList)
        3    0.000    0.000    0.000    0.000 np_activeCells.py:225(getSegUpdates)
       57    0.003    0.000    0.003    0.000 np_activeCells.py:230(findNumSegs)
       19    0.000    0.000    0.000    0.000 np_activeCells.py:245(getSegmentActiveSynapses)
       19    0.000    0.000    0.005    0.000 np_activeCells.py:266(getBestMatchingCell)
       19    0.000    0.000    0.001    0.000 np_activeCells.py:334(newRandomPrevActiveSynapses)
       85    0.000    0.000    0.000    0.000 np_activeCells.py:359(findLeastUsedSeg)
       90    0.000    0.000    0.000    0.000 np_activeCells.py:377(checkColPrevActive)
       10    0.000    0.000    0.000    0.000 np_activeCells.py:385(checkColBursting)
        6    0.000    0.000    0.000    0.000 np_activeCells.py:401(findActiveCell)
        4    0.000    0.000    0.000    0.000 np_activeCells.py:412(findLearnCell)
       84    0.000    0.000    0.000    0.000 np_activeCells.py:421(setActiveCell)
       38    0.000    0.000    0.000    0.000 np_activeCells.py:433(setLearnCell)
      310    0.000    0.000    0.000    0.000 np_activeCells.py:445(checkCellActive)
        6    0.000    0.000    0.000    0.000 np_activeCells.py:458(checkCellLearn)
       84    0.000    0.000    0.000    0.000 np_activeCells.py:468(checkCellPredicting)
        9    0.000    0.000    0.000    0.000 np_activeCells.py:478(segmentHighestScore)
     1410    0.004    0.000    0.004    0.000 np_activeCells.py:495(segmentNumSynapsesActive)
      141    0.001    0.000    0.005    0.000 np_activeCells.py:521(getBestMatchingSegment)
        3    0.000    0.000    0.004    0.001 np_activeCells.py:552(updateActiveCellScores)
        3    0.000    0.000    0.011    0.004 np_activeCells.py:582(updateActiveCells)
      431    0.017    0.000    0.017    0.000 np_inhibition.py:270(calcualteInhibition)
        3    0.001    0.000    0.018    0.006 np_inhibition.py:333(calculateWinningCols)
       56    0.001    0.000    0.001    0.000 np_sequenceLearning.py:101(updateCurrentSegSyn)
       28    0.000    0.000    0.001    0.000 np_sequenceLearning.py:137(adaptSegments)
     6194    0.005    0.000    0.005    0.000 np_sequenceLearning.py:168(checkCellTime)
        3    0.001    0.000    0.008    0.003 np_sequenceLearning.py:182(sequenceLearning)
       28    0.000    0.000    0.000    0.000 np_sequenceLearning.py:78(addNewSegSyn)
       19    0.000    0.000    0.000    0.000 np_temporal.py:116(setLearnCell)
     6138    0.006    0.000    0.006    0.000 np_temporal.py:126(checkCellPredict)
     4092    0.002    0.000    0.011    0.000 np_temporal.py:139(checkCellActivePredict)
    20901    0.021    0.000    0.063    0.000 np_temporal.py:149(checkColBursting)
        1    0.000    0.000    0.000    0.000 np_temporal.py:163(updateAvgPesist)
        2    0.000    0.000    0.001    0.000 np_temporal.py:280(getPrev2NewLearnCells)
        3    0.011    0.004    0.074    0.025 np_temporal.py:365(updateProximalTempPool)
        3    0.005    0.002    0.021    0.007 np_temporal.py:428(updateDistalTempPool)
       24    0.000    0.000    0.000    0.000 np_temporal.py:84(checkCellLearn)
    50046    0.046    0.000    0.046    0.000 np_temporal.py:94(checkCellActive)
        2    0.000    0.000    0.000    0.000 numeric.py:141(ones)
       77    0.000    0.000    0.002    0.000 numeric.py:406(asarray)
        6    0.000    0.000    0.000    0.000 numeric.py:476(asanyarray)
        9    0.000    0.000    0.000    0.000 numeric.py:79(zeros_like)
       15    0.000    0.000    0.002    0.000 op.py:742(rval)
        6    0.000    0.000    0.005    0.001 op.py:767(rval)
      190    0.000    0.000    0.000    0.000 random.py:293(sample)
       63    0.000    0.000    0.002    0.000 safe_asarray.py:12(_asarray)
        3    0.000    0.000    0.002    0.001 scan_op.py:638(<lambda>)
        3    0.000    0.000    0.002    0.001 scan_op.py:670(rval)
        1    0.000    0.000    0.000    0.000 sdrFunctions.py:29(joinInputArrays)
        6    0.000    0.000    0.000    0.000 shape_base.py:113(atleast_3d)
        2    0.000    0.000    0.000    0.000 shape_base.py:319(dstack)
        3    0.005    0.002    0.005    0.002 subtensor.py:2084(perform)
        3    0.000    0.000    0.010    0.003 theano_learning.py:133(updatePermanenceValues)
        3    0.000    0.000    0.000    0.000 theano_overlap.py:304(checkNewInputParams)
        2    0.000    0.000    0.000    0.000 theano_overlap.py:314(addPaddingToInput)
        3    0.000    0.000    0.001    0.000 theano_overlap.py:458(addVectTieBreaker)
        3    0.000    0.000    0.005    0.002 theano_overlap.py:463(maskTieBreaker)
        3    0.000    0.000    0.001    0.000 theano_overlap.py:476(getColInputs)
        3    0.000    0.000    0.000    0.000 theano_overlap.py:522(getPotentialOverlaps)
        3    0.000    0.000    0.014    0.005 theano_overlap.py:528(calculateOverlap)
        3    0.000    0.000    0.001    0.000 theano_overlap.py:564(removeSmallOverlaps)
        3    0.000    0.000    0.000    0.000 theano_predictCells.py:250(getActiveSegTimes)
        3    0.000    0.000    0.000    0.000 theano_predictCells.py:259(getSegUpdates)
       14    0.000    0.000    0.000    0.000 theano_predictCells.py:264(getSegmentActiveSynapses)
       14    0.000    0.000    0.000    0.000 theano_predictCells.py:287(checkCellPredicting)
       14    0.000    0.000    0.000    0.000 theano_predictCells.py:297(setPredictCell)
      140    0.000    0.000    0.000    0.000 theano_predictCells.py:314(checkCellActive)
        3    0.000    0.000    0.012    0.004 theano_predictCells.py:347(updatePredictiveState)
       93    0.000    0.000    0.000    0.000 type.py:385(<lambda>)
        3    0.000    0.000    0.000    0.000 type.py:579(value_zeros)
       93    0.000    0.000    0.003    0.000 type.py:67(filter)
        3    0.000    0.000    0.002    0.001 vm.py:204(__call__)
       15    0.002    0.000    0.002    0.000 {cutils_ext.cutils_ext.run_cthunk}
       83    0.000    0.000    0.000    0.000 {getattr}
       36    0.000    0.000    0.000    0.000 {hasattr}
      119    0.000    0.000    0.000    0.000 {isinstance}
    24339    0.001    0.000    0.001    0.000 {len}
        8    0.000    0.000    0.000    0.000 {math.ceil}
       92    0.000    0.000    0.000    0.000 {math.floor}
      280    0.000    0.000    0.000    0.000 {max}
      202    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        8    0.000    0.000    0.000    0.000 {method 'argsort' of 'numpy.ndarray' objects}
        2    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
        2    0.000    0.000    0.000    0.000 {method 'copy' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        6    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
        9    0.000    0.000    0.000    0.000 {method 'item' of 'numpy.ndarray' objects}
        3    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
      190    0.000    0.000    0.000    0.000 {method 'random' of '_random.Random' objects}
        2    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
        9    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
        2    0.000    0.000    0.000    0.000 {method 'round' of 'numpy.ndarray' objects}
        2    0.000    0.000    0.000    0.000 {method 'setdefault' of 'dict' objects}
       10    0.000    0.000    0.000    0.000 {method 'tolist' of 'numpy.ndarray' objects}
      136    0.000    0.000    0.000    0.000 {min}
        3    0.000    0.000    0.000    0.000 {numpy.core.multiarray.arange}
       90    0.002    0.000    0.002    0.000 {numpy.core.multiarray.array}
       10    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
       11    0.000    0.000    0.000    0.000 {numpy.core.multiarray.copyto}
        9    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty_like}
        2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.empty}
        2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.unravel_index}
       56    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
    25139    0.003    0.000    0.003    0.000 {range}
        3    0.002    0.001    0.002    0.001 {theano.scan_module.scan_perform.perform}
      144    0.000    0.000    0.000    0.000 {time.time}
       41    0.000    0.000    0.000    0.000 {zip}

Note the temporal pooling time of ~0.095 seconds, this is over half the total calcualtion time. (0.095 0.032 HTM_network.py:823(temporalPooler))

calumroy commented 7 years ago

Looking at just the functions in the np_temporal pooler for the same HTM network:

         93155 function calls in 0.060 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    18414    0.018    0.000    0.052    0.000 np_temporal.py:149(checkColBursting)
        1    0.008    0.008    0.060    0.060 np_temporal.py:364(updateProximalTempPool)
    37851    0.032    0.000    0.032    0.000 np_temporal.py:94(checkCellActive)
    18429    0.001    0.000    0.001    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       30    0.000    0.000    0.000    0.000 {min}
    18429    0.002    0.000    0.002    0.000 {range}

         8705 function calls in 0.010 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:823(argsort)
       10    0.000    0.000    0.000    0.000 np_temporal.py:116(setLearnCell)
     3069    0.003    0.000    0.003    0.000 np_temporal.py:126(checkCellPredict)
     2046    0.001    0.000    0.005    0.000 np_temporal.py:139(checkCellActivePredict)
       10    0.000    0.000    0.000    0.000 np_temporal.py:168(segmentNumSynapsesActive)
        1    0.000    0.000    0.000    0.000 np_temporal.py:193(getBestMatchingSegment)
        1    0.000    0.000    0.000    0.000 np_temporal.py:262(findLeastUsedSeg)
        1    0.000    0.000    0.000    0.000 np_temporal.py:280(getPrev2NewLearnCells)
        1    0.000    0.000    0.000    0.000 np_temporal.py:336(newRandomPrevActiveSynapses)
        1    0.002    0.002    0.010    0.010 np_temporal.py:428(updateDistalTempPool)
       14    0.000    0.000    0.000    0.000 np_temporal.py:84(checkCellLearn)
     3078    0.003    0.000    0.003    0.000 np_temporal.py:94(checkCellActive)
        3    0.000    0.000    0.000    0.000 numeric.py:476(asanyarray)
       10    0.000    0.000    0.000    0.000 random.py:293(sample)
        3    0.000    0.000    0.000    0.000 shape_base.py:113(atleast_3d)
        1    0.000    0.000    0.000    0.000 shape_base.py:319(dstack)
       68    0.000    0.000    0.000    0.000 {len}
       13    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'argsort' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       10    0.000    0.000    0.000    0.000 {method 'random' of '_random.Random' objects}
        1    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
        3    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.unravel_index}
      356    0.000    0.000    0.000    0.000 {range}

The function (checkColBursting) in the function (updateProximalTempPool) is taking significantly longer then the rest of the class ~0.052 seconds.

This function could be deleted if we just added as an input to the updateProximalTempPool which columns are bursting. Other calculators such as the np_activeCells sets the columns into the bursting state. It could also store which columns are bursting and pass this to the np_learning calculator.

Alternatively we could implement a theano_temporal calculator but this will take some time and may not give as big of a speed improvement as desired.

calumroy commented 7 years ago

An input was added to the function updateProximalTempPool so the columns bursting times don't have to be recalculated. This has doubled the speed of this function.

        17832 function calls in 0.032 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    17391    0.003    0.000    0.003    0.000 np_temporal.py:149(checkColBursting)
        1    0.028    0.028    0.032    0.032 np_temporal.py:357(updateProximalTempPool)
       17    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      405    0.000    0.000    0.000    0.000 {min}
       17    0.000    0.000    0.000    0.000 {range}

         8662 function calls in 0.010 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:823(argsort)
       15    0.000    0.000    0.000    0.000 np_temporal.py:116(setLearnCell)
     3069    0.003    0.000    0.003    0.000 np_temporal.py:126(checkCellPredict)
     2046    0.001    0.000    0.005    0.000 np_temporal.py:139(checkCellActivePredict)
        1    0.000    0.000    0.000    0.000 np_temporal.py:273(getPrev2NewLearnCells)
        1    0.003    0.003    0.010    0.010 np_temporal.py:424(updateDistalTempPool)
       16    0.000    0.000    0.000    0.000 np_temporal.py:84(checkCellLearn)
     3098    0.004    0.000    0.004    0.000 np_temporal.py:94(checkCellActive)
        3    0.000    0.000    0.000    0.000 numeric.py:476(asanyarray)
        3    0.000    0.000    0.000    0.000 shape_base.py:113(atleast_3d)
        1    0.000    0.000    0.000    0.000 shape_base.py:319(dstack)
       40    0.000    0.000    0.000    0.000 {len}
       17    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'argsort' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
        3    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.concatenate}
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.unravel_index}
      343    0.000    0.000    0.000    0.000 {range}

The majority of the remaining time is spent checking if cells are active in the function "checkCellActive". The next best method to increase the speed of this calcualtor would be to implement a pure theano GPU version. This will take a significant amount of time. Further work should be done investigating if the current numpy temporal pooling algorithm is performing as expected before embarking on the theano version.