alanyuchenhou / elephant

MIT License
4 stars 5 forks source link

game playing with CNN #10

Closed alanyuchenhou closed 8 years ago

alanyuchenhou commented 8 years ago

Paper: Better Computer Go Player with Neural Network and Long-term Prediction

Scenario: predict following moves in go

Given the game history Predict the next k moves

Implementation: CNN

The CNN reads a heavily hand engineered 19x19x25 D vector (with current, historical game situation and also the opponent information encoded) and predict next k future moves at the same time

Highlights

  1. k outputs generated concurrently instead of sequentially, useful for training, not predicting
  2. no pooling layers
  3. no weight constraint(no filter replications?)
  4. the problem is substantially harder than similar image recognition problems (very high sensitivity)
  5. 1st convolutional layer is not trained, but manually generated
  6. no well defined input layer
  7. history is handled in a very unique way
  8. recurrent net didn't perform well
  9. optional search engine
  10. this paper has too many elements

    Questions

  11. does the CNN treat next k moves as independent?
  12. will it be better to use recurrent net to handle game history instead of hand-engineering a feature map?
  13. will it be better to remove all hand engineered features?
ghost commented 8 years ago

Might also be interested in this deep learning approach to playing arcade games: http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html The same group at Google released Tensor Flow as a tool for this kind of work. Might be worth a look. See https://www.tensorflow.org/.

alanyuchenhou commented 8 years ago

Paper: Human-level control through deep reinforcement learning

Scenario: predict actions in a arcade game

Given current game situation Predict optimum actions

Implementation: CNN

This paper is too comprehensive while the implementation details of the reinforcement learning and prediction is not clearly described.

Highlights

  1. experience replay: randomize the ordering of inputs in the sequence to remove false correlations between inputs
  2. iterative update: adjust action-values towards target values to reduce false correlations with target