Evolve the catch world - Githubissues

brohrer commented 9 years ago

Make more complex, more intuitive, more aesthetically appealing. Make world bigger. Incorporate aspects that require deep learning.

brohrer commented 9 years ago

A prioritized (and highly plastic) list of subtasks.

Visualize all sensors
Separate range and heading
Remove reward scaffolding for range and flow
Reward changes in sensor readings
Add bumpers
Add proximity
Punish hard contact
Reward soft contact
Fuzzy and/or overlapping sensors
Sense flow (changes in range, changes in heading, changes in proximity)
Reward unpredicted sensor sequences

brohrer commented 9 years ago

An updated list of subtasks.

Reward changes in sensor readings
Add bumpers
Add proximity
Punish hard contact
Reward soft contact
Fuzzy and/or overlapping sensors
Sense flow (changes in range, changes in heading, changes in proximity)
Reward unpredicted sensor sequences

brohrer commented 9 years ago

An updated list of subtasks.

Slow down exploration
Add more potential features
Speed up learning
Render features
Sense velocity and acceleration
Add bumpers
Add proximity
Punish hard contact
Reward soft contact
Fuzzy and/or overlapping sensors
Sense flow (changes in range, changes in heading, changes in proximity)
Reward unpredicted sensor sequences

brohrer commented 9 years ago

An updated list of subtasks.

Render prox and bump features in the animation
Render features for print_features
Sense velocity and acceleration
Reward soft contact
Fuzzy and/or overlapping sensors
Sense flow (changes in range, changes in heading, changes in proximity)
Reward unpredicted sensor sequences

brohrer commented 9 years ago

An updated list of subtasks.

Render features for print_features
Sense velocity and acceleration
Reward soft contact
Fuzzy and/or overlapping sensors
Sense flow (changes in range, changes in heading, changes in proximity)
Reward unpredicted sensor sequences

brohrer commented 9 years ago

An updated list of subtasks.

Reward soft contact
Fuzzy and/or overlapping sensors
Sense flow (changes in range, changes in heading, changes in proximity)
Reward unpredicted sensor sequences

brohrer commented 9 years ago

This task is being put on hold. Debugging it exposed that BECCA is in need of some fundamental reworking in order to be successful in this world. When dealing with a dynamic environment it is more effective to separate out the modeling of the world dynamics (state-action->state) from the assignment of value (state->reward). Traditional reinforcement learning cuts this corner by creating value functions of the form (state-action->reward). In the approach I'm pursuing this doesn't appear to scale, even with automatic feature creation. For all its simplicity, the chase world has far more possible states than a chess board.

Another benefit of separating out modeling from reward assignment is that it allows for a planner to intervene and specify intermediate goals by assigning temporary artificial reward values to arbitrary features. This sounds like something a human brain might do.

When I get BECCA back up and running, I'll revisit the chase world development and complexification.

brohrer / robot-brain-project

Evolve the catch world #14