Farama-Foundation / ViZDoom

Reinforcement Learning environments based on the 1993 game Doom :godmode:
https://vizdoom.farama.org/
1.73k stars 399 forks source link

Vizdoom navigation problem #445

Closed pengzhi1998 closed 3 years ago

pengzhi1998 commented 4 years ago

Hi, I'm using VizDoom to train my agent to learn to navigate. But there are several questions:

  1. in my scenario, the maze is discrete, which means I regard the vizdoom as a 3D grid world. However, I don't know how to randomly initialize my agent in those grids.
  2. By the way, I want the agent to move in three ways: move forward for 1 grid (a specific length), turn left (90 degrees), and turn right (90 degrees as well). But I only find the button to move in a very small step or turn a small angle. Could you please provide me with some hints to do those things? Thanks a lot!
Miffyli commented 4 years ago

1) Take a look at MazeExplorer to generate random levels with ViZDoom 2) You can use Button.TURN_LEFT_RIGHT_DELTA to turn specific number of degrees (the value is degrees-per-step, so setting it to 90 and proceeding by one step will turn 90 degrees). You need to manually measure how long you need to move forward/backward to step by one "grid-block". Note that this can be bit tricky, as there is some acceleration involved.

pengzhi1998 commented 4 years ago

Thank you so much! But what do you mean acceleration? And the "grid-block" is something invisible, what I need is just to move forward in a specific length (maybe the same length as the grid width which will be easier to compute). And if I use the Button.TURN_LEFT_RIGHT_DELTA, where can I change the degree?

mwydmuch commented 4 years ago

Hi @pengzhi1998, as @Miffyli said, you can use Button.TURN_LEFT_RIGHT_DELTA and Button.MOVE_FORWARD_BACKWARD_DELTA to simulate discrete movement. Values you passed as an action can be any floating point number, while for normal buttons value other than 0 is interpreted as pushed, for delta buttons value actually determine an angle/distance. So for Button.TURN_LEFT_RIGHT_DELTA you can pass value -90/90 to turn by 90 degrees left/right in one step. This example presents this functionality: https://github.com/mwydmuch/ViZDoom/blob/master/examples/python/delta_buttons.py

pengzhi1998 commented 4 years ago

@mwydmuch Thank you!!

pengzhi1998 commented 4 years ago

Sorry to keep bothering you. I checked the issues and found some researchers facing problems (such as memory leak) when using VizDoom to train A3C models with PyTorch. Is it still a big problem for Pytorch on multi-processing programs in VizDoom nowadays? By the way, in our scenario, it's a little bit hard to set the standard of restarting a new episode for the game in ACS file with command "restart". May I use game.close() in the loop in python scripts to do that manually?

Miffyli commented 4 years ago

ViZDoom should not be leaking memory when used correctly. I have not had issues using it with either PyTorch or Tensorflow. Mind you, I have not tried using the pytorch.multiprocessing much.

If you repeatedly call game.close() and game.init(), that is one possible source of memory leakage and I would advice against it.

mwydmuch commented 4 years ago

@pengzhi1998 personally I've never noticed a significant increase in memory usage by ViZDoom (and it uses just a few mega) even after many hours of running the same instance that was playing the same scenario over and over again.

But as @Miffyli, I also discourage a usage of game.init() and game.close(). It's better to use one or a few running instances as long as possible and restarting the scenario with game.new_episode(). This will be also much faster since game.new_episode() takes a few ms while closing and starting the whole engine again takes almost a second.

Also, to correctly handle restart called from ACS script you need to check if game.is_episode_finished() and if true then call game.new_episode().

mwydmuch commented 4 years ago

Oh, and a good way to finish an episode from ACS script is to kill a player with https://zdoom.org/wiki/DamageActor 😄

mwydmuch commented 4 years ago

Sorry, I forgot, actually there is a better way, Exit_Normal(0) function that will finish the episode.

pengzhi1998 commented 4 years ago

Thank you so much for your help! It's so nice of you!! I found a possible solution here. The researcher just uses game.new_episode() to start a new episode without checking game.is_episode_finished() or setting restart inside their ACS. They just use the step counter in their python program and start a new episode which will base on the counter. The author replied to me this will not cause any memory leak because that function didn't start a new game environment but only started a new episode inside the same game environment. But when I test it with the code below (only small adjustments from basic.py):

if __name__ == "__main__":

game = vzd.DoomGame()

# Now it's time for configuration!
# load_config could be used to load configuration instead of doing it here with code.
# If load_config is used in-code configuration will also work - most recent changes will add to previous ones.
# game.load_config("../../scenarios/basic.cfg")

# Sets path to additional resources wad file which is basically your scenario wad.
# If not specified default maps will be used and it's pretty much useless... unless you want to play good old Doom.
# game.set_doom_scenario_path("../../scenarios/basic.wad")
game.set_doom_scenario_path("../../../NavDoom/outputs/11_TRAIN.wad")
# game.set_doom_scenario_path("../../../vizdoomgymmaze/vizdoomgymmaze/envs/scenarios/four/four_1.wad")

# Sets map to start (scenario .wad files can contain many maps).
game.set_doom_map("map01")

# Sets resolution. Default is 320X240
game.set_screen_resolution(vzd.ScreenResolution.RES_640X480)

# Sets the screen buffer format. Not used here but now you can change it. Default is CRCGCB.
game.set_screen_format(vzd.ScreenFormat.RGB24)

# Enables depth buffer.
game.set_depth_buffer_enabled(True)

# Enables labeling of in game objects labeling.
game.set_labels_buffer_enabled(True)

# Enables buffer with top down map of the current episode/level.
game.set_automap_buffer_enabled(True)
game.set_automap_mode(vzd.AutomapMode.OBJECTS_WITH_SIZE)

game.add_available_game_variable(vzd.GameVariable.POSITION_X)
game.add_available_game_variable(vzd.GameVariable.POSITION_Y)
game.add_available_game_variable(vzd.GameVariable.POSITION_Z)
game.add_available_game_variable(vzd.GameVariable.ANGLE)

# Enables information about all objects present in the current episode/level.
game.set_objects_info_enabled(True)

# Enables information about all sectors (map layout).
game.set_sectors_info_enabled(True)

# Sets other rendering options (all of these options except crosshair are enabled (set to True) by default)
game.set_render_hud(False)
game.set_render_minimal_hud(False)  # If hud is enabled
game.set_render_crosshair(False)
game.set_render_weapon(True)
game.set_render_decals(False)  # Bullet holes and blood on the walls
game.set_render_particles(False)
game.set_render_effects_sprites(False)  # Smoke and blood
game.set_render_messages(False)  # In-game messages
game.set_render_corpses(False)
game.set_render_screen_flashes(True)  # Effect upon taking damage or picking up items

# Adds buttons that will be allowed.
game.add_available_button(vzd.Button.MOVE_FORWARD_BACKWARD_DELTA, 1000)
game.add_available_button(vzd.Button.TURN_LEFT_RIGHT_DELTA)
game.add_available_button(vzd.Button.TURN_LEFT_RIGHT_DELTA)
# game.add_available_button(vzd.Button.ATTACK)

# Adds game variables that will be included in state. And ammo is only a prop for the doomgame
# game.add_available_game_variable(vzd.GameVariable.AMMO0)

# Causes episodes to finish after 200 tics (actions)
game.set_episode_timeout(200)

# Makes episodes start after 10 tics (~after raising the weapon)
game.set_episode_start_time(10)

# Makes the window appear (turned on by default)
game.set_window_visible(True)

# Turns on the sound. (turned off by default)
game.set_sound_enabled(True)

# Sets the livin reward (for each move) to -1
game.set_living_reward(-1)

# Sets ViZDoom mode (PLAYER, ASYNC_PLAYER, SPECTATOR, ASYNC_SPECTATOR, PLAYER mode is default)
game.set_mode(vzd.Mode.PLAYER)

# Enables engine output to console.
#game.set_console_enabled(True)

# Initialize the game. Further configuration won't take any effect from now on.
game.init()

# Define some actions. Each list entry corresponds to declared buttons:
# MOVE_LEFT, MOVE_RIGHT, ATTACK
# game.get_available_buttons_size() can be used to check the number of available buttons.
# 5 more combinations are naturally possible but only 3 are included for transparency when watching.
actions = [[960, 0, 0], [0, 90, 0], [0, 0, -90]] # this is to set the specific value for the
# action the agent chooses. Meanwhile, the button need to choose the *_DELTA.
# actions = [[True, False], [False, True]]

# Run this many episodes
episodes = 1000000

# Sets time that will pause the engine after each action (in seconds)
# Without this everything would go too fast for you to keep track of what's happening.
sleep_time = 1.0 / vzd.DEFAULT_TICRATE  # = 0.028
# sleep_time = 28
for i in range(episodes):
    print("Episode #" + str(i + 1))

    # Starts a new episode. It is not needed right after init() but it doesn't cost much. At least the loop is nicer.
    game.new_episode()
    step = 0

    # while not game.is_episode_finished():
    while step < 5:
        step += 1
        # Gets the state
        state = game.get_state()
        map = state.automap_buffer
        # if map is not None:
        #     plt.imshow(map, cmap = 'gray')
        #     plt.show()
        #     cv2.imshow('Vizdoom Automap Buffer', map)
        #
        # cv2.waitKey(sleep_time)

        # Which consists of:
        n = state.number
        vars = state.game_variables
        screen_buf = state.screen_buffer
        depth_buf = state.depth_buffer
        labels_buf = state.labels_buffer
        automap_buf = state.automap_buffer
        labels = state.labels
        objects = state.objects
        sectors = state.sectors

        # Games variables can be also accessed via:
        #game.get_game_variable(GameVariable.AMMO2)

        # Makes a random action and get remember reward.
        r = game.make_action(choice(actions))

        # Makes a "prolonged" action and skip frames:
        # skiprate = 4
        # r = game.make_action(choice(actions), skiprate)

        # The same could be achieved with:
        # game.set_action(choice(actions))
        # game.advance_action(skiprate)
        # r = game.get_last_reward()

        # Prints state's game variables and reward.
        print("State #" + str(n))
        print("Game variables:", vars)
        print("Reward:", r)
        print("=====================")

        if sleep_time > 0:
            sleep(sleep_time)

    # Check how the episode went.
    print("Episode finished.")
    print("Total reward:", game.get_total_reward())
    print("************************")

game.close()

the memory usage keeps increasing. Not sure why this happens. Sorry to have brought you with so much inconvenience 😢

mwydmuch commented 4 years ago

@pengzhi1998 yeah you can use game.new_episode() at any point to restart, sorry, I thought that it's quite obvious.

When it comes to memory leak, I ran this code that you posted (with my_way_home.wad) and I didn't see any memory leak, amount of ram used by ViZDoom was the same for the whole time. Can you provide more details about the issue? How fast is the increese in memory usage, what OS and Python version you are using, can you provide this wad file?

pengzhi1998 commented 4 years ago

Thank you! I'm sorry I thought the new episode trigger had to be set inside the ACS (for example, the game takes a new episode only when the player kills a monster or approaches a goal point). So I thought I had to set something manually in ACS or use the is_episode_finished() function. Thank you for telling me that! About the issue of memory problem, I found that in 1000 episodes, the program increasingly takes around 1 gigabyte which is wired. I tested it on Ubuntu 18.04.2 LTS with python 3.6.10. The ACS file is shown below:

#include "zcommon.acs"

#define TARGET_ID_START 1000 
#define GOAL_TID 999

global int 0:reward;
global int 1:goal_x;
global int 2:goal_y;
global int 3:goal_z;

int TARGET_ID_END = TARGET_ID_START;
int SPAWN_LOC_ID = 0;
int GOAL_LOC_ID = 0;

function int fdistance (int tid1, int tid2)
{
    int len;
    int y = getactory(tid1) - getactory(tid2);
    int x = getactorx(tid1) - getactorx(tid2);
    int z = getactorz(tid1) - getactorz(tid2);

    int ang = vectorangle(x,y);
    if(((ang+0.125)%0.5) > 0.25) len = fixeddiv(y, sin(ang));
    else len = fixeddiv(x, cos(ang));

    ang = vectorangle(len, z);
    if(((ang+0.125)%0.5) > 0.25) len = fixeddiv(z, sin(ang));
    else len = fixeddiv(len, cos(ang));

    return len;
}

script 1 ENTER
{
    TARGET_ID_END = TARGET_ID_START;
    while(IsTIDUsed(TARGET_ID_END + 1))
    {
        TARGET_ID_END += 1;
    }

    // Spawn actor
    SPAWN_LOC_ID = random(TARGET_ID_START, TARGET_ID_END);
    SetActorPosition(0, GetActorX(SPAWN_LOC_ID), GetActorY(SPAWN_LOC_ID), 0.0, 0);
    SetActorAngle(0, 1.0);
    SetActorVelocity(0, 0, 0, 0, FALSE, FALSE);

    // Spawn goals
    if(!IsTIDUsed(GOAL_TID)) {
        GOAL_LOC_ID = SPAWN_LOC_ID;
        until(GOAL_LOC_ID!=SPAWN_LOC_ID) GOAL_LOC_ID = random(TARGET_ID_START, TARGET_ID_END);
        until(SpawnSpot("TallRedColumn", GOAL_LOC_ID, GOAL_TID));
        goal_x = GetActorX(GOAL_LOC_ID);
        goal_y = GetActorY(GOAL_LOC_ID);
        goal_z = GetActorZ(GOAL_LOC_ID);
    }

    // Wait
    until(fdistance(0, GOAL_TID) < 64.0) 
    {
        Delay(1);
    }
    reward += 10.0;

    // restart;

The wad file is here: 11_TRAIN.wad.zip

By the way, even though I have set a large moving speed in the basic.py (game.add_available_button(vzd.Button.MOVE_FORWARD_BACKWARD_DELTA)), the agent inside the game always move very slow. And the speed limit has been set to 1000, so I don't think this is the reason. Do you have an idea on that? I really appreciate your great help!

Miffyli commented 4 years ago

1) I ran the scenario you provided with a random agent, and did not observe any memory leakage (in fact, memory use was a very fixed number). Ubuntu 18.04, Python 3.6, ViZDoom 1.1.8 (current master).

2) Yes, there are some limitations for the speed, and the fact there is acceleration involved makes things more difficult, too. You could measure how long it takes to move for one block when pressing Button.FORWARD, and then use that tick amount together with make_action to move correct amount. This could be possible with fancy ACS scripts that teleport player around, but I do not know how this could be done.

mwydmuch commented 4 years ago

It should be possible to build a scenario with a discrete movement using https://zdoom.org/wiki/GetPlayerInput in a while loop for checking inputs and moving the player with https://zdoom.org/wiki/SetActorPosition, but probably some collision checking will be necessary. It can be done by checking player's sector and its lines or having some dummy actor and teleport it to the player's destination and check https://zdoom.org/wiki/CheckSight. That would be my approach to this.

pengzhi1998 commented 4 years ago

Sorry to bother you again. Actually I have solved the former problems. Thank you so much for your great help! I have another question about the position. When running one python script, the position is shown all in positive numbers. However, in another script, the position is shown sometimes in negative numbers. It seems like the coordinate system has been changed when running different python script codes. However, I set the variables in the same way: game.add_available_game_variable(GameVariable.POSITION_X) And I just used the same WAD file. Do you know how to deal with this problem?

Miffyli commented 4 years ago

There's nothing stopping the coordinates from being negative. It all depends on how the map was built and where the players spawn. Origin can be in one of the corners of the map, or maybe in the center. You can process this much like positive numbers.

mwydmuch commented 4 years ago

This sound strange, the position should be consistent. Are you sure the player is not spawned/teleported to the different locations each time you start your scenario? I can look into this if you provide the scripts and WAD file to replicate the problem.

pengzhi1998 commented 4 years ago

Hi, thank you both so much!! I cannot solve those problems without your help. Thanks again! Actually, I'm trying to deal with a localization problem with VizDoom in my undergraduate thesis. I wrote two scripts: example.py and static_goal.acs which solved the mentioned problems above. And they are borrowed from your project as well as NavDoom. Hopefully, if other programmers face similar problems, the scripts might help.