Fix ppo_pfrl, maddpg_v2 and madg agents

This PR partially fixes #15.

ppo_pfrl runs now. I believe the registry mapping changed was changed after the ppo_pfrl.py was pushed, which is why it wasn't working. Modifying the registry mappings to what is followed now fixed the problem.

maddpg_v2 runs now. It has a similiar issue like ppo_pfrl but it had an additional problem where the replay buffer where storing action states taken rather than the action probabilities. Changing it to store action probabilities led the code to work.

madg needed no modification. Importing it in agent/init.py made it work.

maddpg was not fixed. It seems that the code was incomplete. Since there is a version 2 of this algorithm implemented, I left it as it is.

Note that, I tested the code that it worked for both SUMO and CityFlow. I did not test it till convergence, since it takes a long time for the policy based algorithms. But I think it should work as I made no major modifications to the code itself.

DaRL-LibSignal / LibSignal

Fix ppo_pfrl, maddpg_v2 and madg agents #32