Questions about experimental functions

baifanxxx commented 3 years ago

Dear author,

Thank you very much for making such a great project, it is very helpful to my research. But your codes and functions are too many, I don't know how to start, can you help me get started quickly? I just want the most complete set of codes for training, testing and evaluation, maybe the map_size = 64, obj_num = 25, action_space = 5. By the way, I train this code "train_comb_one_frame_add_action_all_reward_loss_details_failure_cases_reinforce_conflict_large_most_25_random_index_NN12_poly_2_channel_net17" for too much time, and don't reach the total_episodes = 500000000. But, in your paper, you said "The training took about one day to finish on a machine". So am I wrong to run this code?

Thank you, Best regards, BAI Fan

HanqingWangAI commented 3 years ago

@baifanxxx Thanks for your interest in our paper and sorry for the mess of the current code!

Actually, the total_episodes here is just assigned to keep the training going on.

I would try to make a clear program entrance of the training and evaluation code these days and will inform you once finished. :)

baifanxxx commented 3 years ago

@HanqingWangAI Thank you for your reply. I will be always looking forward to your improving. Now, I am still studying your code and have made some progress. I have successfully trained and tested the network, which is the same as your paper "map_size = 64, obj_num = 25, action_space = 5". If you need my help to edit the code or write the readme , you can tell me at any time.

Thank you, Best regards, BAI Fan

HanqingWangAI commented 3 years ago

@baifanxxx That will be great! Feel free to ask any questions and commit your code. And I will keep working on it.

baifanxxx commented 3 years ago

@HanqingWangAI Dear author, I guess that 'train_comb_one_frame_add_action_all_reward_loss_details_failure_cases_reinforce_conflict_large_most_25_random_index_NN12_poly_2_channel_net17()' is the code in your paper, because it has 25 objects, 64*64 map size and LSTM. And I can use 'test_mover_64_net(env, net, sess)' to creat a MCTS and train it. However, if I need a code to test my trained MCTS? I mean we should use our trained MCTS to test it on other situation. In fact, I do not see this code. Could you help tell me which code do I use? Thank you very much.

Best regards, BAI Fan

HanqingWangAI commented 3 years ago

@baifanxxx Hi Fan, Please see here for a quick trial.

baifanxxx commented 3 years ago

@HanqingWangAI Dear author, Thank you for your new code. It is very helpful to me. Now I use PPO to train the policy, but it does not work. I have completed the code, do you have time to help me see it?

HanqingWangAI commented 3 years ago

@baifanxxx Sure, could you please paste the loss/reward curve so that we can analyze the problem. :)

baifanxxx commented 3 years ago

@HanqingWangAI Dear author, thank you very much! I create a new repo, you can find code and curves from https://github.com/baifanxxx/SceneMover-New If you have time and interested in it, you can see my PPO code and help me. Thank you again for your help! Best regards, BAI Fan

baifanxxx commented 3 years ago

@HanqingWangAI Dear author, I'm sorry to disturb you again. what do you think about my PPO code? By the way, I noticed that there is an AC algorithm in your code, but your AC method using policy gradient will show NAN values during the training process. Does this mean that the policy gradient method (your AC and my PPO) is not suitable for this environment ? I look forward to discussing with you! Thank you very much. Best regards, BAI Fan

HanqingWangAI commented 3 years ago

Hi @baifanxxx, sorry for the late reply, it has been busy these days. I see your reward curve. I think it is negative because your model is not converged yet. I paste the curve of the a2c here for your reference. The ac code in this repo has some problems. I will handle it later.

baifanxxx commented 3 years ago

@HanqingWangAI Thank you for your reply. your a2c curve is good. In fact, I have trained my PPO for a long time, but there has been no good results, and there has never been a positive reward. That's why I asked you for advice. I suspect that there is a problem with my code. The model I gave you without converge is only a short-term training. In fact, long-term training will not have good results, but I forgot to save the curve at that time, so I gave you a short-term model. If you are free, I hope you can help me see what is wrong with my code. Thank you very much. Best regards, BAI Fan

feimeng93 commented 3 years ago

Thank you for

@baifanxxx Hi Fan, Please see here for a quick trial.

Truly thankful for your effort. For your interesting work, we implement it with your posted AC alg. by completing the main function but fail to recreate your impressive results. We have just updated our learning curves about imitate_loss and AC-based training loss (loss += train_rl * (critic_loss + actor_loss)). Unfortunately, the Nan of the total loss appears again although the imitate_loss has converged. Thus, could you please upload your main function or kindly point out our overlooks? Again, thank you for your warmhearted contribution to us and our whole community.

Best regards, Fei

HanqingWangAI / SceneMover

Questions about experimental functions #1