In your pseudocode for calculating q, if π is deterministic (as stated in initialization and in pseudocode given for v), then you don't need to loop on all a∈A in step 2 and you don't need to a to ponderate on all a' for the Q(s,a) calculation.
Again, in step 3 you shouldn't loop on a because you get old-action with the deterministic policy.
Thanks for considering this fix ;) Have a nice day !
In your pseudocode for calculating q, if π is deterministic (as stated in initialization and in pseudocode given for v), then you don't need to loop on all a∈A in step 2 and you don't need to a to ponderate on all a' for the Q(s,a) calculation.
Again, in step 3 you shouldn't loop on a because you get old-action with the deterministic policy.
Thanks for considering this fix ;) Have a nice day !