JoeyAndres / rl

Reinforcement Learning library
2 stars 0 forks source link

Null Action for terminal state. #21

Closed JoeyAndres closed 7 years ago

JoeyAndres commented 7 years ago

During a backup, for instance, sarsa:

V(S, A) <- V(S, A) + stepSize [ R - V(S', A') discountRate ]

What if S' is a terminal state? Then what is A'? In this case there should be a following:

ReinforcementLearning<S, A>::terminalStateAction = // generic emty class but comparable to StateAction<S, A>

That means, internal state representation is std::variant. Since std::variant is c++17 will not be supported till few years from now, use the boost:variant.

Edit:

This thing is related to the following warning:

State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185
State-Pair given is not yet added. /home/tester/rl/build/rl_bits/agent/StateActionPairContainer.h:185

Through inspection, those are terminal states, but have no entry in StateActionContainer. Error is thrown during simulation (DynaQ) when it is trying to simulate terminal states.

Edit:

boost::variant is too hacky. We want polymorphism, but its not possible. Only solution, is a separate state for terminal states and impelement a special iterator that iterates through all map and set. We still need a "null action", which can be nullptr at this point.

JoeyAndres commented 7 years ago

Stack trace:

State-Pair given is not yet added. 
Breakpoint 1, rl::agent::StateActionPairContainer<long, long>::getStateActionValue (this=0x82a6e0, stateAction=...) at /home/jandres/Codes/rl/build/rl_bits/agent/StateActionPairContainer.h:195
195               << __FILE__ ":" << __LINE__
(gdb) bt
#0  rl::agent::StateActionPairContainer<long, long>::getStateActionValue (this=0x82a6e0, stateAction=...) at /home/jandres/Codes/rl/build/rl_bits/agent/StateActionPairContainer.h:195
#1  0x000000000056d87e in rl::agent::StateActionPairContainer<long, long>::operator[] (this=0x82a6e0, stateAction=...) at /home/jandres/Codes/rl/build/rl_bits/agent/StateActionPairContainer.h:207
Python Exception <class 'IndexError'> list index out of range: 
#2  0x000000000056c996 in rl::algorithm::ReinforcementLearning<long, long>::argMax (this=0x82a6a0, state=std::shared_ptr (count 7, weak 0) 0x826540, actionSet=std::set with 2 elements)
    at /home/jandres/Codes/rl/build/rl_bits/agent/../algorithm/ReinforcementLearning.h:190
Python Exception <class 'IndexError'> list index out of range: 
#3  0x000000000056a9b4 in rl::algorithm::DynaQRLMP<long, long>::argMax (this=0x82a6a0, state=std::shared_ptr (count 7, weak 0) 0x826540, actionSet=std::set with 2 elements)
    at /home/jandres/Codes/rl/build/rl_bits/algorithm/DynaQRLMP.h:113
Python Exception <class 'IndexError'> list index out of range: 
#4  0x00000000005a7a2a in rl::algorithm::DynaQPrioritizedSweeping<long, long>::update (this=0x82a6a0, currentStateAction=..., nextState=std::shared_ptr (count 7, weak 0) 0x826540, reward=-1, 
    actionSet=std::set with 2 elements) at /home/jandres/Codes/rl/build/rl_bits/algorithm/DynaQPrioritizedSweeping.h:181
#5  0x000000000056bc89 in rl::agent::Agent<long, long>::train (this=0x7fffffffcfc0, state=std::shared_ptr (count 6, weak 0) 0x825310, action=std::shared_ptr (count 7, weak 0) 0x825410, reward=-1, 
    nextState=std::shared_ptr (count 7, weak 0) 0x826540) at /home/jandres/Codes/rl/build/rl_bits/agent/Agent.h:225
#6  0x000000000056beea in rl::agent::Agent<long, long>::execute (this=0x7fffffffcfc0) at /home/jandres/Codes/rl/build/rl_bits/agent/Agent.h:252
#7  0x000000000056bffe in rl::agent::Agent<long, long>::executeEpisode (this=0x7fffffffcfc0, maxIter=100000) at /home/jandres/Codes/rl/build/rl_bits/agent/Agent.h:268
#8  0x00000000005a5f8b in ____C_A_T_C_H____T_E_S_T____35 () at /home/jandres/Codes/rl/test/src/algorithm/DynaQPrioritizedSweeping_test.cpp:58
#9  0x0000000000524aa6 in Catch::FreeFunctionTestCase::invoke (this=0x825570) at /home/jandres/Codes/rl/test/../lib/catch.hpp:6589
#10 0x00000000005107fb in Catch::TestCase::invoke (this=0x829c60) at /home/jandres/Codes/rl/test/../lib/catch.hpp:7526
#11 0x0000000000523b1b in Catch::RunContext::invokeActiveTestCase (this=0x7fffffffdb80) at /home/jandres/Codes/rl/test/../lib/catch.hpp:6165
#12 0x00000000005237c6 in Catch::RunContext::runCurrentTest (this=0x7fffffffdb80, redirectedCout="", redirectedCerr="") at /home/jandres/Codes/rl/test/../lib/catch.hpp:6136
#13 0x00000000005220d9 in Catch::RunContext::runTest (this=0x7fffffffdb80, testCase=...) at /home/jandres/Codes/rl/test/../lib/catch.hpp:5956
#14 0x000000000050de0c in Catch::runTests (config=...) at /home/jandres/Codes/rl/test/../lib/catch.hpp:6304
#15 0x00000000005243da in Catch::Session::run (this=0x7fffffffddc0) at /home/jandres/Codes/rl/test/../lib/catch.hpp:6412
#16 0x00000000005242cc in Catch::Session::run (this=0x7fffffffddc0, argc=1, argv=0x7fffffffdfb8) at /home/jandres/Codes/rl/test/../lib/catch.hpp:6391
#17 0x00000000005162fe in main (argc=1, argv=0x7fffffffdfb8) at /home/jandres/Codes/rl/test/../lib/catch.hpp:10351
JoeyAndres commented 7 years ago

Decided to go with ignoring the warning.