Farama-Foundation / ViZDoom

Reinforcement Learning environments based on the 1993 game Doom :godmode:
https://vizdoom.farama.org/
1.72k stars 397 forks source link

Using position info to do Deep Reinforcement Learning #363

Closed acrushdjn closed 5 years ago

acrushdjn commented 5 years ago

Most of the time people use image as input to train a deep neural network to play Doom, anyone thinking about using real number info(position of player/medkit/enemy/ammo) as a vector to train a deep neural network to play Doom? I used info below to train a network, but the result is not good next_state = (HEALTH, POSITION_X, POSITION_Y, POSITION_Z, ANGLE, PITCH, ROLL, VELOCITY_X, VELOCITY_Y, VELOCITY_Z, CAMERA_POSITION_X, CAMERA_POSITION_Y, CAMERA_POSITION_Z, CAMERA_ANGLE, CAMERA_PITCH, CAMERA_ROLL, enemy_in_view, enemy_position_x, enemy_position_y, enemy_position_z, enemy_angle, enemy_pitch, enemy_roll, enemy_velocity_x, enemy_velocity_y, enemy_velocity_z, medkit_in_view, medkit_position_x, medkit_position_y, medkit_position_z, health_diff, distance_diff, prev_action, obstacle_front, obstacle_back, obstacle_right, obstacle_left )

Miffyli commented 5 years ago

What do you mean by "result is not good"? What scenarios did you try? What were the results? etc.

I did bit of experiments for my master's thesis with such "direct features" you described. I fed information such as scanline (slice from depth map) and info of visible objects to a network, and this produced better results in deathmatch scenario than feeding images (RGB, depth or label). However it did not work in health_gathering_supreme. I used GA3C (A3C variant).

acrushdjn commented 5 years ago

@Miffyli Thanks for your reply, I used direct features I described above, other features are very obvious, (obstacle_front, obstacle_back, obstacle_right, obstacle_left) means distance from walls in agent's front/back/right/left direction. My senario is deathmatch, I used deathmatch_shotgun.wad from here(https://github.com/glample/Arnold/blob/master/resources/scenarios/deathmatch_shotgun.wad), I trained my agent using PPO algorithm, reward shaping is the same as this paper(https://openreview.net/pdf?id=Hk3mPK5gg, TRAINING AGENT FOR FIRST-PERSON SHOOTER GAME WITH ACTOR-CRITIC CURRICULUM LEARNING). I used just one bot(1v1), however, the result is bad, the agent can't learn meaningful actions, the agent just run into walls and got stuck, let alone collecting items and shooting the bots. I was wondering whether it's the problem of state definition, because my PPO algorithm's convergence is very fast(about 200 hundreds iteration), is it because there too little info in my state definition?

Miffyli commented 5 years ago

You are likely running into issues with too sparse rewards: The agent rarely, if ever, scores a kill and thus never gathers enough samples of such event for learning to happen. This is why the paper you cited used curriculum learning to start with many, easy bots, and Arnold used reward shaping to help agent aim at enemies. For further reading, see this summary of ViZDoom competitions for hints.

If you are using speed as a reward as in curriculum paper, then the agent should learn to navigate around decently well, but may get stuck into corners (tested on DQN variants and A3C). Your state representation may not be enough if it just goes forward and gets stuck at first wall.

For combat, I suggest you try a simpler scenario, like defend_the_center where player is allowed to move around (or maybe even smaller map).