decisionforce / CoPO

[NeurIPS 2021] Official implementation of paper "Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization".
Apache License 2.0
117 stars 20 forks source link

Some Visualization Issues #22

Closed 6Lackiu closed 2 years ago

6Lackiu commented 2 years ago

Hello, I am very interested in the CoPO project! But at the moment I have some problems, I hope you can clear my confusion, thanks!

  1. The following error is prompted when running vis_from_checkpoint.py. vis_error My path points to checkpoint-480 as shown. What is the cause of the error? Is it the wrong way to run the script? ckpt

  2. I don't understand how the .npz file in the best_checkpoints folder is generated? npz

  3. You declare checkpoint read type as checkpoint-xxx in vis_from_checkpoint.py. image But declare checkpointname is like: {ALGO} {ENV} _{INDEX}.npz in get_policy_function.py. image Which way should I follow? Do I need to convert checkpoint-xxx files to npz files? How to convert it?

  4. What does "Note that if you are restoring CoPO checkpoint, you need to implement appropriate wrapper to encode the LCF into the observation and feed them to the neural network." in vis_from_checkpoint.py mean? image

  5. How is the following visualization made? The vehicle trajectory and collision location are visually displayed, which is great! image

Very much looking forward to your reply! Thank you for taking the time to answer these questions!

6Lackiu commented 2 years ago

@pengzhenghao Still looking forward to your reply! Thanks!

pengzhenghao commented 2 years ago

I am very sorry for late reply!!


For question 1 and 2, we actually do this:

  1. change RLLib's checkpoint-625 file to a .npz file2.
  2. use .npz file to get a policy_function.
  3. use the policy_function as a convenient interface to get action from neural network.4.

So for your question 1, it seems that the name of each layer is misaligned between policy_function and your model. You can take a look on the definition of policy function and modify it according to your model.

For your question 2, I compress the RLLib's checkpoint to npz file via custom function. A temporary solution is given at: https://github.com/metadriverse/metadrive/issues/118#issuecomment-992637939 Maybe I should make this a more available script so that people can run it easily.


For your question 3, well, I don't really remember the details and there might be some bug in vis_from_checkpoint.py. I can ensure that vis.py is working because I run it few weeks ago. Could you please take a try on vis.py?


For your question 4, as you might know, LCF should serve as the last dimension in the observation. However, the environment in vis.py will only return the raw observation. Therefore, we need to manually insert the LCF and append it to the observation to create an obs that fits into CoPO's policy network. You can hack like new_obs = np.concatenate([old_obs, [0.5]]) so that you tell CoPO agent that we are using LCF=0.

A note here is that LCF should be scale to [0, 1.0] instead of [-1, 1] in the observation. Please refer to the code of my CoPO agents.


Figure 5, I get a road network as background and use matplotlib.pyplot.plot to draw lines. plt.scatter is used to draw black dots. I will open-source script to draw those trajectory. Probably in one week.


Please contact me via email so I can give you instant support. Again, I am very sorry for my late reply!!!

pengzhenghao commented 2 years ago

Hi @6Lackiu

I encountered your problem 1 when I am revisiting my code. A quick fix is to modify the layer_name_suffix when defining policy function:

image
6Lackiu commented 2 years ago

Hi @6Lackiu

I encountered your problem 1 when I am revisiting my code. A quick fix is to modify the layer_name_suffix when defining policy function:

image

@pengzhenghao Thank you very much! Your answer helped me a lot!

Also, I tried the fix you said.

image

But still got the following error. error

My checkpoint path is ckpt_path = "/home/lxy/CoPO/copo_code/copo/myTest/interCopo0624/CoPO_CCMultiAgentIntersectionEnv_7653e_00000_0_start_seed=5000,seed=0,use_centralized_critic=False,use_distributional_svo=True_2022-06-24_11-19-30/checkpoint_625/checkpoint-625"

I haven't modified the core part of CoPO yet, there shouldn't be a dimensional error. Why is this?

pengzhenghao commented 2 years ago

This assertion error shows that the input observation is not correct.

91 is the shape of your obs and 92 is what CoPO network expecting.

Obviously, this is because you don't add LCF to the last dimension of the observation.

Try this:

For your question 4, as you might know, LCF should serve as the last dimension in the observation. However, the environment in vis.py will only return the raw observation. Therefore, we need to manually insert the LCF and append it to the observation to create an obs that fits into CoPO's policy network. You can hack like new_obs = np.concatenate([old_obs, [0.5]]) so that you tell CoPO agent that we are using LCF=0.

A note here is that LCF should be scale to [0, 1.0] instead of [-1, 1] in the observation. Please refer to the code of my CoPO agents.

6Lackiu commented 2 years ago

@pengzhenghao Thank you for your clear answer!

I still have some confusion, since checkpoint-xxx can be converted to .npz file, what is the difference between vis.py and vis_from_checkpoint.py? They all work by reading in .npz files.

In addition, run evaluate_population.py to get the .csv file. How to judge which checkpoint has the best effect through the .csv file, so as to store it in the best_checkpoints folder.

Or is there something wrong with my understanding?

Looking forward to your reply!

pengzhenghao commented 2 years ago

Hi @6Lackiu

Since I am preparing to release evaluation result, I have refactored the evaluation script but not yet merge.

https://github.com/decisionforce/CoPO/pull/24

You can take a look on this PR, where the eval.py script is the script that I want every user can use it to evaluate their own population from RLLib raw result without npz file.

You can add me WeChat so we can discuss more and I am waiting for your email for days..

pengzhenghao commented 2 years ago

Just FYI, I finished benchmarking the results of various MARL algorithms in MetaDrive MARL environments. Please kindly refer to this page: https://github.com/metadriverse/metadrive-benchmark/tree/main/MARL

And I also upload latest trained models so you can run it to visualize the behaviors. https://github.com/decisionforce/CoPO#visualization

Thanks!