Backpropagating the output layer reward

PaulMuadDib commented 5 years ago

Hi Bindsnet, (Far from synapses with conduction delays), I am trying to backpropagate a reward from the output layer to the previous ones (after having added an attribute "reward" to the layers and) using a modified version of the MSTDPET learning rule, according to:

   `# Parse keyword arguments.
    reward  = kwargs['reward']
    a_plus  = kwargs.get('a_plus', 1)
    a_minus = kwargs.get('a_minus', -1)

    #if self.target is network.layers['output_layer']:
    if self.connection.wmax==21.0:  
        #print('setting output layer reward of length {}'.format(len(reward)))      
        self.target.reward = reward`

and:

   `# Compute weight update.
    PostPre  = self.nu[0] * self.target.reward * self.e_trace
    self.connection.w += PostPre

    # Find presynaptic neurons such that update is max for rewarded neurons
    values, indices    = PostPre.sort(dim=0, descending=True)
    '''
    not_to_reward = torch.nonzero(self.target.reward < 0).view(-1)
    self.source.reward[indices[-1,:].view(-1)]  = self.target.reward[not_to_reward]
    to_reward = torch.nonzero(self.target.reward > 0).view(-1)
    values, indices    = PostPre[:,to_reward].sort(dim=0, descending=True)
    self.source.reward[indices[0,:].view(-1)]   = self.target.reward[to_reward]
    '''
    for neur in range(len(self.target.reward)):
        if   self.target.reward[neur]<0:
            self.source.reward[indices[-1,neur]]  = self.target.reward[neur]
        elif self.target.reward[neur]>0:
            self.source.reward[indices[0,neur]]   = self.target.reward[neur]`

I also call the synapses updates in reverse order (since the reward are computed from the output layer to the previous ones) in the "run" defintion in networks.

But this little addition takes so much time to compute (from 1.5sec to 36sec on the gpu): would you have an idea why ?

djsaunde commented 5 years ago

I'm guessing it's because of this for loop:

for neur in range(len(self.target.reward)):
    ...

If that loops through all neurons in the layer (which happens every timestep), then it will probably slow things down quite a bit.

You can use Python's built-in cProfile to check what is causing the slowdown.

PaulMuadDib commented 5 years ago

Thank you, Indeed, the following is much faster... and enable me to test this way of backpropagating the reward in a deep SNN:

   `# Compute weight update.
    PostPre  = self.nu[0] * self.target.reward * self.e_trace
    self.connection.w += PostPre

    # Find presynaptic neurons with the largest STDP update 
    values, indices = PostPre.sort(dim=0, descending=True)
    # Punish the presynaptic neurons inducing the min (negative) update  
    to_punish   = torch.nonzero(self.target.reward < 0).view(-1)
    self.source.reward[indices[-1,to_punish].view(-1)] = self.target.reward[to_punish]
    # Reward the presynaptic neurons inducing the max (positive) update         
    to_reward       = torch.nonzero(self.target.reward > 0).view(-1)
    self.source.reward[indices[0,to_reward].view(-1)]  = self.target.reward[to_reward]`

BINDS-LAB-UMASS / bindsnet_experiments

Backpropagating the output layer reward #1