Open calumroy opened 10 years ago
After some research I have decided to try implement Q learning within the thalamus class. The idea is that the top levels output is sent to the thalamus and it assignes Q values to each of the input cell grid squares. Then normal Q learning is performed and an output is selected by the thalamus and sent back to the HTM as a top level feedback command.
Here is a post on one way of combining to the two https://cireneikual.wordpress.com/2015/01/08/continuous-htm-multiple-layers-and-reinforcement-learning/
I think a better solution is to not use a feedforward neural network and just use the output of the HTM. Here is a post on the email discussion about q learning and the HTM.
Hi Eric
Gideon (also on the list) and I have been working on this for a while. We are very keen on assigning Q values to each HTM cell. This seems to work really well. However, in practice we have faced the following difficulties with making the idea work properly as a complete agent:
regards
The HTM hierarchy has been implemented in the balancer project. Feedback commands have not been tested on any results yet. An Issue with the current design (commit 51b2037db9faa75eb0501dc670202b5491f0bc88) is that there is no way to direct the commands coming from the highest level.
A possible solution is to add some sort of SDR recognizer. This could perform a function where it recognizes SDR's that are "desirable" and then attempt to issue only commands that have been known to produce the desired SDR. This function could be something the thalamus does in the real neocortex through gating the output of SDR's from different levels. It could be thought as the thalamus remembering a desirable past experience and attempting to change the output of the neocortex to produce the same experience.