Closed Yifei-Bi closed 2 years ago
From your terminal info, it seems that mpi4py is having problems with the multi-threaded part, have you tried running:
python train.py --env-id Safexp-PointGoal1-v0 --algo ppo-lag --cores 1 --seed 0
Will this command work?
From your terminal info, it seems that mpi4py is having problems with the multi-threaded part, have you tried running:
python train.py --env-id Safexp-PointGoal1-v0 --algo ppo-lag --cores 2 --seed 0
Will this command work?
我运行了上面的代码,然后出现上面的那个报错
I'm very sorry for your bad usage experience, please allow me to answer your question in more detail.
Firstly, my command above is trying to get you to set cores to 1 to see if cores=1
is possible.
python train.py --env-id Safexp-PointGoal1-v0 --algo ppo-lag --cores 1 --seed 0
Secondly, can you run the following code to output the number of physical cores of the cpu on your local machine?
import psutil
physical_cores = psutil.cpu_count(logical=False)
print(physical_cores)
According to the parallelism mechanism of the mpi4py module, the number of physical cores of the cpu should be larger than the size of the cores
parameter.
Finally, i just reinstalled the SafePO environment on a pristine ubuntu 20.04 machine and we tested it with no problems, can you provide more details about the configuration of your machine and the configuration of the software environment, this will help us to locate your problem faster. Your terminal information as above we are not able to locate where the problem is.
Hope this can help you.
Traceback (most recent call last): File "train.py", line 54, in
if mpi_tools.mpi_fork(args.cores,use_number_of_threads=use_number_of_threads):
File "/content/drive/MyDrive/save/Safe-Policy-Optimization/safepo/common/mpi_tools.py", line 97, in mpi_fork
subprocess.check_call(args, env=env)
File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['mpirun', '-np', '2', '--use-hwthread-cpus', '/usr/bin/python3', 'train.py', '--env-id', 'Safexp-PointGoal1-v0', '--algo', 'ppo-lag', '--cores', '2', '--seed', '0']' returned non-zero exit status 1.
[5e506354ec7b:06923] Process received signal
why i have this question?
please help me,thank you