brohrer / robot-brain-project

a general purpose learning agent
205 stars 46 forks source link

Evolve the catch world #14

Closed brohrer closed 9 years ago

brohrer commented 9 years ago

Make more complex, more intuitive, more aesthetically appealing. Make world bigger. Incorporate aspects that require deep learning.

brohrer commented 9 years ago

A prioritized (and highly plastic) list of subtasks.

  1. Visualize all sensors
  2. Separate range and heading
  3. Remove reward scaffolding for range and flow
  4. Reward changes in sensor readings
  5. Add bumpers
  6. Add proximity
  7. Punish hard contact
  8. Reward soft contact
  9. Fuzzy and/or overlapping sensors
  10. Sense flow (changes in range, changes in heading, changes in proximity)
  11. Reward unpredicted sensor sequences
brohrer commented 9 years ago

An updated list of subtasks.

  1. Reward changes in sensor readings
  2. Add bumpers
  3. Add proximity
  4. Punish hard contact
  5. Reward soft contact
  6. Fuzzy and/or overlapping sensors
  7. Sense flow (changes in range, changes in heading, changes in proximity)
  8. Reward unpredicted sensor sequences
brohrer commented 9 years ago

An updated list of subtasks.

  1. Slow down exploration
  2. Add more potential features
  3. Speed up learning
  4. Render features
  5. Sense velocity and acceleration
  6. Add bumpers
  7. Add proximity
  8. Punish hard contact
  9. Reward soft contact
  10. Fuzzy and/or overlapping sensors
  11. Sense flow (changes in range, changes in heading, changes in proximity)
  12. Reward unpredicted sensor sequences
brohrer commented 9 years ago

An updated list of subtasks.

  1. Render prox and bump features in the animation
  2. Render features for print_features
  3. Sense velocity and acceleration
  4. Reward soft contact
  5. Fuzzy and/or overlapping sensors
  6. Sense flow (changes in range, changes in heading, changes in proximity)
  7. Reward unpredicted sensor sequences
brohrer commented 9 years ago

An updated list of subtasks.

  1. Render features for print_features
  2. Sense velocity and acceleration
  3. Reward soft contact
  4. Fuzzy and/or overlapping sensors
  5. Sense flow (changes in range, changes in heading, changes in proximity)
  6. Reward unpredicted sensor sequences
brohrer commented 9 years ago

An updated list of subtasks.

  1. Reward soft contact
  2. Fuzzy and/or overlapping sensors
  3. Sense flow (changes in range, changes in heading, changes in proximity)
  4. Reward unpredicted sensor sequences
brohrer commented 9 years ago

This task is being put on hold. Debugging it exposed that BECCA is in need of some fundamental reworking in order to be successful in this world. When dealing with a dynamic environment it is more effective to separate out the modeling of the world dynamics (state-action->state) from the assignment of value (state->reward). Traditional reinforcement learning cuts this corner by creating value functions of the form (state-action->reward). In the approach I'm pursuing this doesn't appear to scale, even with automatic feature creation. For all its simplicity, the chase world has far more possible states than a chess board.

Another benefit of separating out modeling from reward assignment is that it allows for a planner to intervene and specify intermediate goals by assigning temporary artificial reward values to arbitrary features. This sounds like something a human brain might do.

When I get BECCA back up and running, I'll revisit the chase world development and complexification.