facebookresearch / nle

The NetHack Learning Environment
Other
939 stars 114 forks source link

Is it not possible to eat? #230

Closed PeterAJansen closed 3 years ago

PeterAJansen commented 3 years ago

I have read the other threads on related topics -- just confirming, it is currently not possible to (for example) eat, or do most other actions that require secondary arguments? (And, if true, would this not ultimately limit the agent's duration until it starves to death? Or it's ability to e.g. pick up and use better items?)

e.g.

import json
import gym
import nle

env = gym.make("NetHackScore-v0")
env.reset()  # each reset generates a new dungeon

env.print_action_meanings()
print("")

env.step(21)  # eat

env.render()

env.step( ord('g') ) # select which object in the backpack to eat
env.render()

Output:

`0 MiscAction.MORE
1 CompassDirection.N
2 CompassDirection.E
3 CompassDirection.S
4 CompassDirection.W
5 CompassDirection.NE
6 CompassDirection.SE
7 CompassDirection.SW
8 CompassDirection.NW
9 CompassDirectionLonger.N
10 CompassDirectionLonger.E
11 CompassDirectionLonger.S
12 CompassDirectionLonger.W
13 CompassDirectionLonger.NE
14 CompassDirectionLonger.SE
15 CompassDirectionLonger.SW
16 CompassDirectionLonger.NW
17 MiscDirection.UP
18 MiscDirection.DOWN
19 MiscDirection.WAIT
20 Command.KICK
21 Command.EAT
22 Command.SEARCH

b'What do you want to eat? [fghi or ?*] '
a an uncursed +2 pair of leather gloves (being worn)
b a blessed +1 robe (being worn)
c a blessed spellbook of healing
d an uncursed scroll of remove curse
e 3 uncursed potions of healing
f 3 uncursed food rations
g 6 uncursed apples
h 6 uncursed oranges
i 3 uncursed fortune cookies

                           -------+--                                          
                           |....)...|                                          
                           |.f......|                                          
                           ...@..{.:.                                          
                           ----------                                          

Traceback (most recent call last):
  File "/home/peter/github/nethack-interface/python/test.py", line 19, in <module>
    env.step( ord('g') )
  File "/home/peter/anaconda3/envs/nle/lib/python3.8/site-packages/nle/env/base.py", line 329, in step
    observation, done = self.env.step(self._actions[action])
IndexError: tuple index out of range
heiner commented 3 years ago

Hey Peter,

Thanks for your interest in NLE, and thanks for including the example code so we can discuss this more easily.

Notice the difference between these two lines:

env.step(21)  # eat
env.step( ord('g') )  # ???

Notice that 21 is not ord('e') (that would be 101). What happens here is that NLE has an action space that maps any enabled action (different for the different tasks like score, eat, etc) into an integer from 0 to NUM_ACTIONS.

That's also the reason ord('g') fails: ord('g') is 103, which is larger than NUM_ACTIONS in this case, hence the IndexError.

So what's the solution? With the action space you have here, you'll only be able to eat items that happen to have a letter that corresponds to one of the move actions, as only those are available. E.g., items labeled j or h you can eat (see e.g. this definition of actions: https://github.com/facebookresearch/nle/blob/master/nle/nethack/actions.py#L39).

To play the full game without this awkward restriction, play with a larger action space.

PeterAJansen commented 3 years ago

Thanks for your quick reply -- I see now, it's not possible to do these kinds of compositional actions currently, but you can add them into the action space in the Tasks definition file here:

https://github.com/facebookresearch/nle/blob/3f15220cbf511fcfbe124a39bd15565ce3033b18/nle/env/tasks.py#L14

And then they just show up. And because of the peculiarity of Nethack, one would have to map the item's reference key to the specific action to be able to select a given thing to (e.g.) eat.

PeterAJansen commented 3 years ago

More specifically, for anyone getting onboard, here's a workaround:

In actions.py, I added/modified:

class RawKeyPresses(enum.IntEnum):
    KEYPRESS_A = ord("A")
    KEYPRESS_B = ord("B")
    KEYPRESS_C = ord("C")
    KEYPRESS_D = ord("D")
    KEYPRESS_E = ord("E")
    KEYPRESS_F = ord("F")
    KEYPRESS_G = ord("G")
    KEYPRESS_H = ord("H")
    KEYPRESS_I = ord("I")
    KEYPRESS_J = ord("J")
    KEYPRESS_K = ord("K")
    KEYPRESS_L = ord("L")
    KEYPRESS_M = ord("M")
    KEYPRESS_N = ord("N")
    KEYPRESS_O = ord("O")
    KEYPRESS_P = ord("P")
    KEYPRESS_Q = ord("Q")
    KEYPRESS_R = ord("R")
    KEYPRESS_S = ord("S")
    KEYPRESS_T = ord("T")
    KEYPRESS_U = ord("U")
    KEYPRESS_V = ord("V")
    KEYPRESS_W = ord("W")
    KEYPRESS_X = ord("X")
    KEYPRESS_Y = ord("Y")
    KEYPRESS_Z = ord("Z")
    KEYPRESS_a = ord("a")
    KEYPRESS_b = ord("b")
    KEYPRESS_c = ord("c")
    KEYPRESS_d = ord("d")
    KEYPRESS_e = ord("e")
    KEYPRESS_f = ord("f")
    KEYPRESS_g = ord("g")
    KEYPRESS_h = ord("h")
    KEYPRESS_i = ord("i")
    KEYPRESS_j = ord("j")
    KEYPRESS_k = ord("k")
    KEYPRESS_l = ord("l")
    KEYPRESS_m = ord("m")
    KEYPRESS_n = ord("n")
    KEYPRESS_o = ord("o")
    KEYPRESS_p = ord("p")
    KEYPRESS_q = ord("q")
    KEYPRESS_r = ord("r")
    KEYPRESS_s = ord("s")
    KEYPRESS_t = ord("t")
    KEYPRESS_u = ord("u")
    KEYPRESS_v = ord("v")
    KEYPRESS_w = ord("w")
    KEYPRESS_x = ord("x")
    KEYPRESS_y = ord("y")
    KEYPRESS_z = ord("z")
    KEYPRESS_0 = ord("0")
    KEYPRESS_1 = ord("1")
    KEYPRESS_2 = ord("2")
    KEYPRESS_3 = ord("3")
    KEYPRESS_4 = ord("4")
    KEYPRESS_5 = ord("5")
    KEYPRESS_6 = ord("6")
    KEYPRESS_7 = ord("7")
    KEYPRESS_8 = ord("8")
    KEYPRESS_9 = ord("9")

ACTIONS = tuple(
    list(CompassDirection)
    + list(CompassDirectionLonger)
    + list(MiscDirection)
    + list(MiscAction)
    + list(Command)
    + list(TextCharacters)
    + list(RawKeyPresses)
)

In Tasks.py, I then added the reference to the key presses:

TASK_ACTIONS = tuple(
    [nethack.MiscAction.MORE]
    + list(nethack.CompassDirection)
    + list(nethack.CompassDirectionLonger)
    + list(nethack.MiscDirection)
    + [nethack.Command.KICK, nethack.Command.EAT, nethack.Command.SEARCH, nethack.Command.LOOK]
    + list(nethack.RawKeyPresses)
)

Then, here's a quick test for eating:

import gym
import nle
from nle import nethack

# Get an action index from it's name
def selectAction(env, actionName):
    actionNames = [str(x) for x in env._actions]
    actionIdx = actionNames.index(actionName) if actionName in actionNames else -1
    return actionIdx

# Create the environment
env = gym.make("NetHackScore-v0")
env.reset()  # each reset generates a new dungeon

# Debug: Print action space meanings
env.print_action_meanings()
print("")

# Action: Select 'eat'
obs, reward, isTerminal, extraInfo = env.step( selectAction(env, "Command.EAT") )  # Eat
env.render("full")

# Then, press 'g' (for the orange)
obs, reward, isTerminal, extraInfo = env.step( selectAction(env, "RawKeyPresses.KEYPRESS_h") )  # Press "h"
env.render("full")

Which produces the following (successful) output:

0 MiscAction.MORE
1 CompassDirection.N
2 CompassDirection.E
3 CompassDirection.S
4 CompassDirection.W
5 CompassDirection.NE
6 CompassDirection.SE
7 CompassDirection.SW
8 CompassDirection.NW
9 CompassDirectionLonger.N
10 CompassDirectionLonger.E
11 CompassDirectionLonger.S
12 CompassDirectionLonger.W
13 CompassDirectionLonger.NE
14 CompassDirectionLonger.SE
15 CompassDirectionLonger.SW
16 CompassDirectionLonger.NW
17 MiscDirection.UP
18 MiscDirection.DOWN
19 MiscDirection.WAIT
20 Command.KICK
21 Command.EAT
22 Command.SEARCH
23 Command.LOOK
24 RawKeyPresses.KEYPRESS_A
25 RawKeyPresses.KEYPRESS_B
26 RawKeyPresses.KEYPRESS_C
27 RawKeyPresses.KEYPRESS_D
28 RawKeyPresses.KEYPRESS_E
29 RawKeyPresses.KEYPRESS_F
30 RawKeyPresses.KEYPRESS_G
31 RawKeyPresses.KEYPRESS_H
32 RawKeyPresses.KEYPRESS_I
33 RawKeyPresses.KEYPRESS_J
34 RawKeyPresses.KEYPRESS_K
35 RawKeyPresses.KEYPRESS_L
36 RawKeyPresses.KEYPRESS_M
37 RawKeyPresses.KEYPRESS_N
38 RawKeyPresses.KEYPRESS_O
39 RawKeyPresses.KEYPRESS_P
40 RawKeyPresses.KEYPRESS_Q
41 RawKeyPresses.KEYPRESS_R
42 RawKeyPresses.KEYPRESS_S
43 RawKeyPresses.KEYPRESS_T
44 RawKeyPresses.KEYPRESS_U
45 RawKeyPresses.KEYPRESS_V
46 RawKeyPresses.KEYPRESS_W
47 RawKeyPresses.KEYPRESS_X
48 RawKeyPresses.KEYPRESS_Y
49 RawKeyPresses.KEYPRESS_Z
50 RawKeyPresses.KEYPRESS_a
51 RawKeyPresses.KEYPRESS_b
52 RawKeyPresses.KEYPRESS_c
53 RawKeyPresses.KEYPRESS_d
54 RawKeyPresses.KEYPRESS_e
55 RawKeyPresses.KEYPRESS_f
56 RawKeyPresses.KEYPRESS_g
57 RawKeyPresses.KEYPRESS_h
58 RawKeyPresses.KEYPRESS_i
59 RawKeyPresses.KEYPRESS_j
60 RawKeyPresses.KEYPRESS_k
61 RawKeyPresses.KEYPRESS_l
62 RawKeyPresses.KEYPRESS_m
63 RawKeyPresses.KEYPRESS_n
64 RawKeyPresses.KEYPRESS_o
65 RawKeyPresses.KEYPRESS_p
66 RawKeyPresses.KEYPRESS_q
67 RawKeyPresses.KEYPRESS_r
68 RawKeyPresses.KEYPRESS_s
69 RawKeyPresses.KEYPRESS_t
70 RawKeyPresses.KEYPRESS_u
71 RawKeyPresses.KEYPRESS_v
72 RawKeyPresses.KEYPRESS_w
73 RawKeyPresses.KEYPRESS_x
74 RawKeyPresses.KEYPRESS_y
75 RawKeyPresses.KEYPRESS_z
76 RawKeyPresses.KEYPRESS_0
77 RawKeyPresses.KEYPRESS_1
78 RawKeyPresses.KEYPRESS_2
79 RawKeyPresses.KEYPRESS_3
80 RawKeyPresses.KEYPRESS_4
81 RawKeyPresses.KEYPRESS_5
82 RawKeyPresses.KEYPRESS_6
83 RawKeyPresses.KEYPRESS_7
84 RawKeyPresses.KEYPRESS_8
85 RawKeyPresses.KEYPRESS_9

b'What do you want to eat? [fghi or ?*] '
$ 4 gold pieces
a an uncursed +2 pair of leather gloves (being worn)
b an uncursed +1 robe (being worn)
c a blessed spellbook of protection
d an uncursed scroll of remove curse
e 3 uncursed potions of healing
f 4 uncursed food rations
g 5 uncursed apples
h 7 uncursed oranges
i 3 uncursed fortune cookies
j a magic marker (0:82)

 -------                                                                       
 |{.....                                                                       
 |...d.|                                                                       
 |.d....                                                                       
 |.k@..|                                                                       
 -------                                                                       

b'This orange is delicious!'
$ 4 gold pieces
a an uncursed +2 pair of leather gloves (being worn)
b an uncursed +1 robe (being worn)
c a blessed spellbook of protection
d an uncursed scroll of remove curse
e 3 uncursed potions of healing
f 4 uncursed food rations
g 5 uncursed apples
h 6 uncursed oranges
i 3 uncursed fortune cookies
j a magic marker (0:82)

 -------                                                                       
 |{.....                                                                       
 |...d.|                                                                       
 |.d....                                                                       
 |.k@..|                                                                       
 -------                                                                       

The critical lines in the output being:

b'What do you want to eat? [fghi or ?*] '

after supplying the 'eat' command, and:

b'This orange is delicious!'

after supplying the raw key 'h' (which is the orange in the inventory).

Hope that's helpful to someone else just getting started.

heiner commented 3 years ago

Hey Peter,

Thanks for your response. I'm happy to learn you unblocked yourself here.

Let me point out that you hopefully don't need to edit the actions.py file in this way. As an example, the Challenge task uses a large action space with just about every available action: https://github.com/facebookresearch/nle/blob/master/nle/env/tasks.py#L328

This might be somewhat confusing as the names of these variables are not always "correct" in this setting (e.g., to eat item e one does Command.EAT twice in a row). Still the ACTIONS tuple does contain (almost) all actions (https://github.com/facebookresearch/nle/blob/master/nle/nethack/actions.py#L179).

You can use the action_id_to_type function to map from inputs like ord('e') to names like EAT. You'll find that almost all ASCII inputs have a corresponding name in ACTIONS.

dmadeka commented 3 years ago

An easier way might be to just blow up the action space. The way we've been thinking about it is that essentially, you can have an action like "Eat+J" and then use the gym step to take two "press" two keys if those actions are taken.

This stops you from needing to include keys presses (e.g. I think CompassDirectionLonger is strictly dominated by CompassDirection for an RL agent) that are strictly dominated by others for the RL Agent.

heiner commented 3 years ago

Hey Dhruv,

While this is an interesting idea, notice that the game does not guarantee that you can eat/fire/etc in all situations. E.g., when polymorphed into a non-eating monster (cf. https://github.com/facebookresearch/nle/blob/master/src/eat.c#L2480). In other situations, the game might ask you additional questions (really eat the food in this tin, do you want to continue, etc). Coming up with an encapsulation of all this logic in general is going to be tricky and probably not worth it.