da-luce / cornell-autobike

Codebase of the Cornell Autonomous Bicycle Team
https://www.cuautobike.org/
1 stars 1 forks source link

Vectorizing #43

Open AriMirsky opened 9 months ago

AriMirsky commented 9 months ago

To be able to use more standard reinforcement learning code, we want to be able to vectorize our algorithm. To do this we will need to make two matrices:

P - the probability transition function - an |A|x|S|x|S| matrix where the probability we end up in state s' given we start in state s and take action a is P[a, s, s'].

R - the reward function - an |A|x|S| matrix where the reward we do for doing action a in state s is R[a, s].

In addition, because we want these to be 3 and 2 dimensional matrices respectively, we will want to make a conversion between state and a singular index. The same will need to be done to represent actions with a singular index. This can be done by whenever we have an array representing the current state or action's indices in a multidimensional matrix, taking the first number, then to do the next one, multiplying by the size of that array's dimension then adding the next number.

Example: The state array is a 10x20x30x40 matrix and we have indices [5, 6, 7, 8] that we want to convert to a single index. We do ((520+6)30+7)*40+8 = 127488