Svalorzen / AI-Toolbox

A C++ framework for MDPs and POMDPs with Python bindings
GNU General Public License v3.0
646 stars 98 forks source link

tiger_antelope.py: ValueError: Input transition matrix does not contain valid probabilities. #45

Closed troyrock closed 3 years ago

troyrock commented 3 years ago

After compiling, I get the following error when trying to run the tiger_antelope.py in the examples/MDP directory:

18:30:32 - Constructing MDP... Traceback (most recent call last): File "tiger_antelope.py", line 341, in solve_mdp(horizon=args.horizon, tolerance=args.tolerance) File "tiger_antelope.py", line 289, in solve_mdp model.setTransitionFunction(T) ValueError: Input transition matrix does not contain valid probabilities.

Svalorzen commented 3 years ago

Strange, it works here. Also, line 341 in the original file is

args = parser.parse_args()

And not the solve_mdp line. Sure you didn't change anything in the file?

Svalorzen commented 3 years ago

As a sanity check, you could try to add at line 264 (in the loop that constructs the probabilities) the following:

    for state in range(len(S)):
        coord = decodeState(state)
        T.append([[getTransitionProbability(coord, action,
                                            decodeState(next_state))
                   for next_state in range(len(S))] for action in A])
        print([sum(x) for x in T[-1]]) # <-------------------- ADD THIS LINE

The last line will perform the sums for you; all printed numbers should be 1.0, otherwise something is going wrong.

troyrock commented 3 years ago

I'm using python 3.7.5 so I had to change the print statements in the rendering section to have ()'s and I collapsed the T.append command into a single line when it didn't work in hopes of getting it to work. I have added the print statement (and changed the T.append line back to the way it was) that section of the code looks as follows:

S = list(itertools.product(range(SQUARE_SIZE), repeat=4))

# A = tiger actions
A = ['stand', 'up', 'down', 'left', 'right']

# T gives the transition probability for every s, a, s' triple.
T = []
for state in range(len(S)):
    coord = decodeState(state)
    T.append([[getTransitionProbability(coord, action,
                                        decodeState(next_state))
               for next_state in range(len(S))] for action in A])
    print([sum(x) for x in T[-1]])

with the following result: 08:03:37 - Constructing MDP... [1.0, 1.0, 1.0, 1.0, 1.0] [0.2, 0.2, 0.2, 0.0, 0.0] [0.2, 0.0, 0.2, 0.0, 0.0] [0.2, 0.2, 0.0, 0.0, 0.0] [0.2, 0.2, 0.2, 0.0, 0.0] [0.2, 0.4, 0.4, 0.0, 0.0] [0.2, 0.0, 0.2, 0.0, 0.0] [0.2, 0.2, 0.0, 0.0, 0.0] [0.2, 0.2, 0.2, 0.0, 0.0] [0.2, 0.0, 0.2, 0.0, 0.0] [0.2, 0.4, 0.4, 0.0, 0.0] [0.2, 0.2, 0.2, 0.0, 0.0] [0.2, 0.2, 0.2, 0.0, 0.0] [0.2, 0.2, 0.2, 0.0, 0.0] [0.2, 0.2, 0.2, 0.0, 0.0] [0.2, 0.4, 0.4, 0.0, 0.0] [0.2, 0.2, 0.4, 0.0, 0.0]

this continues for quite some time and ends with:

[0.4, 0.4, 0.2, 0.0, 0.0] [0.4, 0.4, 0.2, 0.0, 0.0] [0.4, 0.4, 0.2, 0.0, 0.0] [0.2, 0.4, 0.2, 0.0, 0.0] [0.4, 0.4, 0.4, 0.0, 0.0] [0.4, 0.2, 0.2, 0.0, 0.0] [0.2, 0.2, 0.2, 0.0, 0.0] [0.2, 0.2, 0.2, 0.0, 0.0] [0.4, 0.2, 0.2, 0.0, 0.0] Traceback (most recent call last): File "tiger_antelope.py", line 340, in solve_mdp(horizon=args.horizon, tolerance=args.tolerance) File "tiger_antelope.py", line 288, in solve_mdp model.setTransitionFunction(T) ValueError: Input transition matrix does not contain valid probabilities.

I'm interested in using AI-Toolbox to replicate some work done in the paper "Selection of first-line therapy in multiple sclerosis using risk-benefit decision analysis" but I don't see how to add states that don't have actions associated with them. In this case, for instance, when a patient contracts PML, there is no decision to be made, it simply moves forward to the next of two states with 24% and 76% probability.

(The paper is available here: https://www.researchgate.net/publication/312354698_Selection_of_first-line_therapy_in_multiple_sclerosis_using_risk-benefit_decision_analysis if you have an interest.)

Thank you for making the tool available.

On Tue, Oct 27, 2020 at 1:48 AM Eugenio Bargiacchi notifications@github.com wrote:

As a sanity check, you could try to add at line 264 (in the loop that constructs the probabilities) the following:

for state in range(len(S)):
    coord = decodeState(state)
    T.append([[getTransitionProbability(coord, action,
                                        decodeState(next_state))
               for next_state in range(len(S))] for action in A])
    print([sum(x) for x in T[-1]]) # <-------------------- ADD THIS LINE

The last line will perform the sums for you; all printed numbers should be 1.0, otherwise something is going wrong.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Svalorzen/AI-Toolbox/issues/45#issuecomment-717084741, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABM2IICKR7UV4S2P7AEMGFTSM2CNJANCNFSM4TAEWADQ .

Svalorzen commented 3 years ago

Ah, I see. Something is going wrong in the creation of the transition function. What the Python code in the example is doing is creating a 3-dimensional matrix, with dimensions SxAxS, where each entry T[s][a][s'] corresponds to the probability of transitioning from state s to state s', given action a.

What this means is that, to be correct probability distributions, all these numbers must be valid probabilities (between 0 and 1), and that the sum for all possible transitions from T[s][a] must be 1.0. The error is related to the fact that these assumptions are not true, so the program aborts. The prints are to check that all these sums are indeed 1.0, and in your case they are not, which explains it.

That said, I think I found the problem. It's a subtler change on how integer divisions between Python 2 and Python 3. I'm going to do a patch soon that makes the example more "cross-compatible" against both versions.

As a quick fix, you need to replace the decodeState(state) function in your example file with:

def decodeState(state):
    """
    Convert from state_index to coordinate.

    Parameters
    ----------
    state: int
        Index of the state.

    Returns
    -------
    coord: tuple of int
        Four element tuple containing the position of the tiger and antelope.
    """
    coord = []
    for _ in range(4):
        c = state % SQUARE_SIZE
        state = state // SQUARE_SIZE   # This is the changed line that forces an integer division in Python 3.
        coord.append(c)
    return tuple(coord)

Regarding your problem, there's no direct way to do this "cleanly" without modifying the internals of the library (depending on which algorithm you plan to use).

The best solution that will maintain compatibility with everything else is to simply use the same transitions for all actions in your particular states. So, for all actions, the transition probabilities will be the same. This makes it so that picking the action does not affect the environment, which is what you want. This will allow planning algorithms to work correctly (like for example value iteration).

If you plan to use reinforcement learning (for example Q-learning), then you might also want to force your agent to pick a specific action (say, 0) in those states, as it will prevent unnecessary exploration in states where picking an action does not do anything.

Let me know if what I wrote makes sense to you :)

troyrock commented 3 years ago

Thank you, that fix does get it past that point. There are several "print" statements that need to be modified for python 3: print("X", end='') is a replacement for print "X", and xrange needs to be replaced by range or else "from six import xrange" as per https://portingguide.readthedocs.io/en/latest/iterators.html

Thanks for the help with no action states. I'll see if I can get that to work. I really appreciate your help. If you are interested, I'll send you my code (you were looking for other example codes) at the end. (It's for a school project.)

On Tue, Oct 27, 2020 at 8:36 AM Eugenio Bargiacchi notifications@github.com wrote:

Ah, I see. Something is going wrong in the creation of the transition function. What the Python code in the example is doing is creating a 3-dimensional matrix, with dimensions SxAxS, where each entry T[s][a][s'] corresponds to the probability of transitioning from state s to state s', given action a.

What this means is that, to be correct probability distributions, all these numbers must be valid probabilities (between 0 and 1), and that the sum for all possible transitions from T[s][a] must be 1.0. The error is related to the fact that these assumptions are not true, so the program aborts. The prints are to check that all these sums are indeed 1.0, and in your case they are not, which explains it.

That said, I think I found the problem. It's a subtler change on how integer divisions between Python 2 and Python 3. I'm going to do a patch soon that makes the example more "cross-compatible" against both versions.

As a quick fix, you need to replace the decodeState(state) function in your example file with:

def decodeState(state): """ Convert from state_index to coordinate.

Parameters
----------
state: int
    Index of the state.

Returns
-------
coord: tuple of int
    Four element tuple containing the position of the tiger and antelope.
"""
coord = []
for _ in range(4):
    c = state % SQUARE_SIZE
    state = state // SQUARE_SIZE   # This is the changed line that forces an integer division in Python 3.
    coord.append(c)
return tuple(coord)

Regarding your problem, there's no direct way to do this "cleanly" without modifying the internals of the library (depending on which algorithm you plan to use).

The best solution that will maintain compatibility with everything else is to simply use the same transitions for all actions in your particular states. So, for all actions, the transition probabilities will be the same. This makes picking the action does not affect the environment, which is what you want. This will allow planning algorithms to work correctly (like for example value iteration).

If you plan to use reinforcement learning (for example Q-learning), then you might also want to force your agent to pick a specific action (say, 0) in those states, as it will prevent unnecessary exploration in states where picking an action does not do anything.

Let me know if what I wrote makes sense to you :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Svalorzen/AI-Toolbox/issues/45#issuecomment-717328441, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABM2IIFJIZBWXL4L3HB64XDSM3SGPANCNFSM4TAEWADQ .

Svalorzen commented 3 years ago

Sure, that'd be cool! I love to see what people are doing with the library :)

If the example works now feel free to close the issue; if you then have more trouble with your setting just open another one no problem. Good luck for now!