Closed 6Lackiu closed 2 years ago
@pengzhenghao Still looking forward to your reply! Thanks!
I am very sorry for late reply!!
For question 1 and 2, we actually do this:
checkpoint-625
file to a .npz
file2. .npz
file to get a policy_function
. policy_function
as a convenient interface to get action from neural network.4. So for your question 1, it seems that the name of each layer is misaligned between policy_function
and your model. You can take a look on the definition of policy function and modify it according to your model.
For your question 2, I compress the RLLib's checkpoint to npz file via custom function. A temporary solution is given at: https://github.com/metadriverse/metadrive/issues/118#issuecomment-992637939 Maybe I should make this a more available script so that people can run it easily.
For your question 3, well, I don't really remember the details and there might be some bug in vis_from_checkpoint.py
. I can ensure that vis.py
is working because I run it few weeks ago. Could you please take a try on vis.py
?
For your question 4, as you might know, LCF should serve as the last dimension in the observation. However, the environment in vis.py
will only return the raw observation. Therefore, we need to manually insert the LCF and append it to the observation to create an obs that fits into CoPO's policy network. You can hack like new_obs = np.concatenate([old_obs, [0.5]])
so that you tell CoPO agent that we are using LCF=0.
A note here is that LCF should be scale to [0, 1.0] instead of [-1, 1] in the observation. Please refer to the code of my CoPO agents.
Figure 5, I get a road network as background and use matplotlib.pyplot.plot
to draw lines. plt.scatter
is used to draw black dots. I will open-source script to draw those trajectory. Probably in one week.
Please contact me via email so I can give you instant support. Again, I am very sorry for my late reply!!!
Hi @6Lackiu
I encountered your problem 1 when I am revisiting my code. A quick fix is to modify the layer_name_suffix
when defining policy function:
Hi @6Lackiu
I encountered your problem 1 when I am revisiting my code. A quick fix is to modify the
layer_name_suffix
when defining policy function:
@pengzhenghao Thank you very much! Your answer helped me a lot!
Also, I tried the fix you said.
But still got the following error.
My checkpoint path is ckpt_path = "/home/lxy/CoPO/copo_code/copo/myTest/interCopo0624/CoPO_CCMultiAgentIntersectionEnv_7653e_00000_0_start_seed=5000,seed=0,use_centralized_critic=False,use_distributional_svo=True_2022-06-24_11-19-30/checkpoint_625/checkpoint-625"
I haven't modified the core part of CoPO yet, there shouldn't be a dimensional error. Why is this?
This assertion error shows that the input observation is not correct.
91 is the shape of your obs and 92 is what CoPO network expecting.
Obviously, this is because you don't add LCF to the last dimension of the observation.
Try this:
For your question 4, as you might know, LCF should serve as the last dimension in the observation. However, the environment in vis.py will only return the raw observation. Therefore, we need to manually insert the LCF and append it to the observation to create an obs that fits into CoPO's policy network. You can hack like new_obs = np.concatenate([old_obs, [0.5]]) so that you tell CoPO agent that we are using LCF=0.
A note here is that LCF should be scale to [0, 1.0] instead of [-1, 1] in the observation. Please refer to the code of my CoPO agents.
@pengzhenghao Thank you for your clear answer!
I still have some confusion, since checkpoint-xxx can be converted to .npz file, what is the difference between vis.py and vis_from_checkpoint.py? They all work by reading in .npz files.
In addition, run evaluate_population.py to get the .csv file. How to judge which checkpoint has the best effect through the .csv file, so as to store it in the best_checkpoints folder.
Or is there something wrong with my understanding?
Looking forward to your reply!
Hi @6Lackiu
Since I am preparing to release evaluation result, I have refactored the evaluation script but not yet merge.
https://github.com/decisionforce/CoPO/pull/24
You can take a look on this PR, where the eval.py
script is the script that I want every user can use it to evaluate their own population from RLLib raw result without npz file.
You can add me WeChat so we can discuss more and I am waiting for your email for days..
Just FYI, I finished benchmarking the results of various MARL algorithms in MetaDrive MARL environments. Please kindly refer to this page: https://github.com/metadriverse/metadrive-benchmark/tree/main/MARL
And I also upload latest trained models so you can run it to visualize the behaviors. https://github.com/decisionforce/CoPO#visualization
Thanks!
Hello, I am very interested in the CoPO project! But at the moment I have some problems, I hope you can clear my confusion, thanks!
The following error is prompted when running
vis_from_checkpoint.py
. My path points to checkpoint-480 as shown. What is the cause of the error? Is it the wrong way to run the script?I don't understand how the .npz file in the best_checkpoints folder is generated?
You declare checkpoint read type as checkpoint-xxx in
vis_from_checkpoint.py
. But declare checkpointname is like: {ALGO} {ENV} _{INDEX}.npz inget_policy_function.py
. Which way should I follow? Do I need to convert checkpoint-xxx files to npz files? How to convert it?What does "Note that if you are restoring CoPO checkpoint, you need to implement appropriate wrapper to encode the LCF into the observation and feed them to the neural network." in
vis_from_checkpoint.py
mean?How is the following visualization made? The vehicle trajectory and collision location are visually displayed, which is great!
Very much looking forward to your reply! Thank you for taking the time to answer these questions!