Vectorizing - Githubissues

To be able to use more standard reinforcement learning code, we want to be able to vectorize our algorithm. To do this we will need to make two matrices:

P - the probability transition function - an |A|x|S|x|S| matrix where the probability we end up in state s' given we start in state s and take action a is P[a, s, s'].

R - the reward function - an |A|x|S| matrix where the reward we do for doing action a in state s is R[a, s].

In addition, because we want these to be 3 and 2 dimensional matrices respectively, we will want to make a conversion between state and a singular index. The same will need to be done to represent actions with a singular index. This can be done by whenever we have an array representing the current state or action's indices in a multidimensional matrix, taking the first number, then to do the next one, multiplying by the size of that array's dimension then adding the next number.

Example: The state array is a 10x20x30x40 matrix and we have indices [5, 6, 7, 8] that we want to convert to a single index. We do ((520+6)30+7)*40+8 = 127488

da-luce / cornell-autobike

Vectorizing #43