IDSIA / hhmarl_2D

Heterogeneous Hierarchical Multi Agent Reinforcement Learning for Air Combat
59 stars 12 forks source link

FileNotFoundError when first running train_hetero.py, #7

Closed arounderor closed 5 months ago

arounderor commented 5 months ago

When I first run train_hetero.py as the README describes, the FileNotFoundError occured and I debugged the code. It seems that there is not "events" in Ray log dirs, so the logic in update_logs() function inside the train_hetero.py caused the mistake. I want to figure out what wrong with my operations and I'm looking forwad to an appropriate solution. Any suggestions will be appreciated!! 1714985383820 1714985383831 1714985383841

ardian-selmonaj commented 5 months ago

Hi, I see you are using Windows. The process inside "update_logs()" may be only for unix systems, not for windows. Further on, Ray 2.4.0 on Windows is on Beta, as far as I know. So you even might get instable training performance. However for the "update_logs()", you can rewrite the code to copy the "checkpiont" folder from the standard directory to the results directory for windows systems. I would not change deeper parts as in "shutil.py". However, calling "update_logs()" is not so crucial for training, because you can also manually copy and paste "checkpoint" to "results" after training is finished. "checkpoint" is only needed to restore training more easily for the next level.

Alternatively, you can install Linux as a subsystem on windows, which is not a big deal.

I hope this helps!

arounderor commented 5 months ago

Hi, I see you are using Windows. The process inside "update_logs()" may be only for unix systems, not for windows. Further on, Ray 2.4.0 on Windows is on Beta, as far as I know. So you even might get instable training performance. _However for the "updatelogs()", you can rewrite the code to copy the "checkpiont" folder from the standard directory to the results directory for windows systems. I would not change deeper parts as in "shutil.py". However, calling "update_logs()" is not so crucial for training, because you can also manually copy and paste "checkpoint" to "results" after training is finished. "checkpoint" is only needed to restore training more easily for the next level.

Alternatively, you can install Linux as a subsystem on windows, which is not a big deal.

I hope this helps!

arounderor commented 5 months ago

Thank you so much for your answer, it helps me a lot. Wish you good luck!