Implementation project issues

ai-for-decision-making-tue / DRL-TOBM-CPR

Code for the paper "Deep Reinforcement Learning for Two-sided Online Bipartite Matching in Collaborative Order Picking".

5 stars 1 forks source link

Implementation project issues #1

Open SuperMacholo opened 2 months ago

SuperMacholo commented 2 months ago

Hi, thank you very much for providing the code. When I executed ppo_training_mh, I encountered the error code below. The package versions I use are the same as yours except pytorch-geometric(2.5.3) How should I solve it?

SuperMacholo commented 2 months ago

Hi, I fixed this error by downgrading the pytorch-geometric version to 2.1.0, but then the following error occurred (no message_passing.jinja). How should i solve it?

lbegnardi commented 2 months ago

Hi, it looks like you are missing the message_passing.jinja file which should be there, so you might try to remove pytorch-geometric and reinstall it after cleaning the environment. However, I think this is only needed because the GinConv layers are jittable, which is actually not necessary, so you can try removing ".jittable()" from line 35, 136 and 258 of gnn_models.py.

In the meantime I realized that there is another issue related to a change I made in the code of tianshou but I forgot to upload. I will most likely push a fix for that tomorrow.

SuperMacholo commented 2 months ago

Hi, thank you for your reply and help. After I upgraded pytorch-geometric to 2.2.0, I have the path to the relevant files, but the previous problem occurs. I'm thinking whether the "MultiAgentCollector" part needs to be adjusted.

In addition, I would also like to ask how to implement the "invff" mode in this project, because I have seen relevant content but cannot find the setting part.

lbegnardi commented 2 months ago

The problem with Tianshou should be fixed with the latest commit.

For the invff architecture you can simply set to 0 the "n_graph_layers" parameter when initializing the neural networks.

SuperMacholo commented 2 months ago

Hi, thank you for your reply and help. After I re-downloaded the new project, the same error still occurred. I'm currently using python3.9 and the same package version as Requirements (except torch-geometric==2.2.0), and executing the ppo_training_mh file directly. The complete error message is as shown below

I have tried changing the batch(62-->63) program in the original tianshou package

And this program in ppo_training_mh

After the above two adjustments, the program will be executed correctly, but I am not sure whether it meets the goals of the original project. How should I solve the problem?

SuperMacholo commented 2 months ago

Hi, thank you for your reply and help. I just recoded the program code in the new version of the project adapted_tianshou.batch into the content below, and it can be executed correctly. I would like to ask if such a modification will have an incorrect impact on the results?

lbegnardi commented 2 months ago

Hi, are you trying to execute "ppo_training_mh" directly as it is on the repo? I am asking this because I cannot replicate your error. Also, in the original adapted_tianshou.batch the line you commented in the last message is line 61, did you make any other changes?

As long as you are using the adapted versions of the collectors, I don't think that the change you propose in the last comment should impact performances, but I cannot guarantee that since I don't know why the change has to be made.

SuperMacholo commented 2 months ago

Hi, thank you for your reply and help. I previously executed "ppo_training_mh" directly. I only changed obj_array = np.asanyarray(obj) in adapted_tianshou.batch to obj_array = np.asanyarray(obj, dtype="object") and used the adapted versions of the collectors. It all works correctly so far, I guess. The need for such modifications may be due to a version gap in Numpy.

In addition, I am now trying to use GPU acceleration, but it seems that there is no significant improvement after modification. I would like to ask what is the best setting?

lbegnardi commented 2 months ago

You are right, the problem might indeed be due to a different version of Numpy as I realized I didn't list it in the requirements. In my setup I was using version 1.22.3, what is yours?

At some point I tried using GPU as well without noticing any significant improvement in training time. I suppose this is because the time lost transferring torch tensors between CPU and GPU is higher than the gain you get for processing the data faster, since training graphs are still quite small. All the results you see in the paper are therefore obtained using CPU only.

SuperMacholo commented 2 months ago

Hi, thank you for your reply and help. The version I used before was 1.26.4. Because version 1.22.3 was incompatible with other packages, I changed to 1.22.4 and the following prompt message appeared. It seemed that it was indeed caused by different package versions, so I changed it to obj_array = np.asanyarray(obj, dtype="object") should have no impact on the results.

Thank you for your thoughts on using GPU acceleration~ I will try again to see if there is any way to increase the training speed.