calumroy / HTM

HTM
3 stars 0 forks source link

High level commands #8

Open calumroy opened 10 years ago

calumroy commented 10 years ago

The HTM hierarchy has been implemented in the balancer project. Feedback commands have not been tested on any results yet. An Issue with the current design (commit 51b2037db9faa75eb0501dc670202b5491f0bc88) is that there is no way to direct the commands coming from the highest level.

A possible solution is to add some sort of SDR recognizer. This could perform a function where it recognizes SDR's that are "desirable" and then attempt to issue only commands that have been known to produce the desired SDR. This function could be something the thalamus does in the real neocortex through gating the output of SDR's from different levels. It could be thought as the thalamus remembering a desirable past experience and attempting to change the output of the neocortex to produce the same experience.

calumroy commented 9 years ago

After some research I have decided to try implement Q learning within the thalamus class. The idea is that the top levels output is sent to the thalamus and it assignes Q values to each of the input cell grid squares. Then normal Q learning is performed and an output is selected by the thalamus and sent back to the HTM as a top level feedback command.

Here is a post on one way of combining to the two https://cireneikual.wordpress.com/2015/01/08/continuous-htm-multiple-layers-and-reinforcement-learning/

I think a better solution is to not use a feedforward neural network and just use the output of the HTM. Here is a post on the email discussion about q learning and the HTM.

Hi Eric

Gideon (also on the list) and I have been working on this for a while. We are very keen on assigning Q values to each HTM cell. This seems to work really well. However, in practice we have faced the following difficulties with making the idea work properly as a complete agent:

  1. A deep hierarchy is needed to create long-term, abstract concepts to which we can assign meaningful Q values. This means temporal pooling and hierarchical learning must be working really well. At the moment it seems hierarchically-scalable temporal pooling is a Work In Progress for HTM-like algorithms. If we can't create a deep hierarchy, we can't link causes that occur a long time before Rewards, except by discounting (where the signal rapidly becomes weak in a "flat" hierarchy, due to the large number of intermediate states).
  2. If you have hierarchical Q-values, you will want hierarchical action selection. If you have hierarchical action selection, you need to be able to execute actions hierarchically. This poses a number of problems, such as maintaining the agency of actions represented at higher levels of the hierarchy. (see http://a-mpf.blogspot.com.au/2014/12/agency-and-hierarchical-action-selection.html )
  3. "Closing the loop" and allowing the agent's actions to determine future inputs, changes the dynamics of the system, and can lead to runaway feedback effects. For example, say the agent discovers a mildly adaptive action. Does it endlessly repeat that strategy, or keep exploring the space to discover better actions? This exploration-exploitation balance is a well known and unsolved problem ( http://en.wikipedia.org/wiki/Multi-armed_bandit ). Of course, the dilemma applies to organisations and society as well ( http://vserver1.cscs.lsa.umich.edu/~pjlamber/Complexity%20Course_files/exploration_exploitation.pdf ). By definition there is no perfect solution to this problem. Humans are pretty good at it most of the time.

regards