Closed zhaoworking closed 4 years ago
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
This should be changed to a warning, it is not the cause of the segfault.
The error seems to be related to the ALSA lib for sound devices, which is called by pygame at initialization.
Are you trying to run this code on a server with no display ? If so, try to run experiments.py with the --no-display
option, which will prevent calls to pygame.
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
This should be changed to a warning, it is not the cause of the segfault.
The error seems to be related to the ALSA lib for sound devices, which is called by pygame at initialization. Are you trying to run this code on a server with no display ? If so, try to run experiments.py with the
--no-display
option, which will prevent calls to pygame.
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
INFO: Making new env: highway-v0
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), -1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), 1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200406-174819_9262
profiler execution failed
Segmentation fault
I test with the command python3 experiments.py evaluate configs/HighwayEnv/env_easy.json configs/HighwayEnv/agents/DQNAgent/1_step.json --train --episodes=1000 --no-display
, according to what you said , I find the same segmentation fault above. But i saw there is a similar INFO that INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200406-174819_9262
.Is the command that i use wrong , or something else ?
The command that you used is right, so I guess the problem is not related to rendering.
I could not reproduce the issue on my computer:
python3 experiments.py evaluate configs/HighwayEnv/env_easy.json configs/HighwayEnv/agents/DQNAgent/1_step.json --train --episodes=1000 --no-display
pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
INFO: Making new env: highway-v0
[INFO] Choosing GPU device: 0, memory used: 1563
INFO: Creating monitor directory out\HighwayEnv\DQNAgent\run_20200406-125052_9276
C:\Anaconda3\lib\site-packages\torch\onnx\utils.py:501: UserWarning: ONNX export failed on ATen operator reshape because torch.onnx.symbolic.reshape does not exist
.format(op_name, op_name))
[INFO] Episode 0 score: 3.5
[INFO] Episode 1 score: 11.9
[INFO] Episode 2 score: 2.6
[INFO] Episode 3 score: 5.8
[INFO] Episode 4 score: 3.3
[INFO] Episode 5 score: 15.5
[INFO] Episode 6 score: 15.1
[INFO] Episode 7 score: 3.2
[INFO] Episode 8 score: 17.2
[INFO] Episode 9 score: 4.4
[INFO] Episode 10 score: 4.7
[INFO] Episode 11 score: 7.5
[INFO] Episode 12 score: 4.9
[INFO] Episode 13 score: 6.5
[INFO] Episode 14 score: 14.5
Do you have this issue only with the 1_step.json
configuration ?
The command that you used is right, so I guess the problem is not related to rendering.
I could not reproduce the issue on my computer:
python3 experiments.py evaluate configs/HighwayEnv/env_easy.json configs/HighwayEnv/agents/DQNAgent/1_step.json --train --episodes=1000 --no-display pygame 1.9.4 Hello from the pygame community. https://www.pygame.org/contribute.html INFO: Making new env: highway-v0 [INFO] Choosing GPU device: 0, memory used: 1563 INFO: Creating monitor directory out\HighwayEnv\DQNAgent\run_20200406-125052_9276 C:\Anaconda3\lib\site-packages\torch\onnx\utils.py:501: UserWarning: ONNX export failed on ATen operator reshape because torch.onnx.symbolic.reshape does not exist .format(op_name, op_name)) [INFO] Episode 0 score: 3.5 [INFO] Episode 1 score: 11.9 [INFO] Episode 2 score: 2.6 [INFO] Episode 3 score: 5.8 [INFO] Episode 4 score: 3.3 [INFO] Episode 5 score: 15.5 [INFO] Episode 6 score: 15.1 [INFO] Episode 7 score: 3.2 [INFO] Episode 8 score: 17.2 [INFO] Episode 9 score: 4.4 [INFO] Episode 10 score: 4.7 [INFO] Episode 11 score: 7.5 [INFO] Episode 12 score: 4.9 [INFO] Episode 13 score: 6.5 [INFO] Episode 14 score: 14.5
Do you have this issue only with the
1_step.json
configuration ?
Unforunately , I find that all of the agents come to this issue.Is it related to my computer? And i also can't see the item [INFO] Episode x score: x
in my Xshell.
It is probably related to your computer, since the automatic tests are passing:
But I really wonder what could cause such a segmentation fault...
Could you maybe try to use an IDE like PyCharm and going step by step with a debugger, to see where it crashes exactly?
Do you also have a segmentation fault with the cartpole environment for example, or is it only with highway-env ?
Also, could you try adding these lines at the top of experiments.py (after other imports) ?
import os
os.environ['SDL_AUDIODRIVER'] = 'dsp'
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200409-203051_12866
profiler execution failed
INFO: Starting new video recorder writing to /root/rl-agents/scripts/out/HighwayEnv/DQNAgent/run_20200409-203051_12866/openaigym.video.0.12866.video000000.mp4
Segmentation fault
Addingthe code into the experiments.py
,only to find the same situation.
And the day before yesterday , I found it worked well in my VM , of which the version is desktop .Such that, i could see the video
and [INFO]
without any error. So,I guess the issue is most likely related to my remote Linux server,even though i don't know what it is.
As I mentionned, the ALSA lib responsible for the segfault is used by pygame for audio management, and it crashes when it cannot find audio drivers (on your Linux server), probably when pygame is initialised through pygame.init()
.
However, pygame is only used for rendering and should not be initialised when the --no-display
option is used...
(gymlab) root@iZ8vbhynnqk42im5ymgijyZ:~/rl-agents/scripts# python experiments.py evaluate configs/HighwayEnv/env_easy.json configs/HighwayEnv/agents/DQNAgent/baseline.json --train --episodes=1000 --no-display
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
INFO: Making new env: highway-v0
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), -1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), 1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200409-212448_12995
profiler execution failed
Segmentation fault
Thanks for your patient answers .But when i added the option --no-dispaly,the same issue still occurred.
Yes I know, which is why I am a bit clueless about what is going on here.
Actually, the message
INFO: Starting new video recorder writing to /root/rl-agents/scripts/out/HighwayEnv/DQNAgent/run_20200405-230154_7882/openaigym.video.0.7882.video000000.mp4
shows that the gym monitor tried to record the video, which should not happen with the no-display option, and causes the segfault.
Can you add a print statement here to check that video_callable
is set to False?
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
INFO: Making new env: highway-v0
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), -1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), 1) will return an array of dtype('int64')
format(shape, fill_value, array(fill_value).dtype), FutureWarning)
/root/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
[ERROR] Preferred device cuda:best unavailable, switching to default cpu
INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200409-215218_13062
profiler execution failed
Segmentation fault
I added the print statement print('video_callable:',video_callable)
at the top of the line ,but i can't see either True or False that should be printed out on my computer. And i got the same fault.
This is strange, I don't know what to think of this. This print statement should happen before the "INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200409-215218_13062" message. I think the best way to solve this is to use a debugger and breakpoints to track which line exactly causes the segfault.
`(gymlab) root@iZ8vbhynnqk42im5ymgijyZ:~/rl-agents/scripts# python3 experiments.py evaluate configs/HighwayEnv/env_medium.json configs/HighwayEnv/agents/DQNAgent/1_step.json --train --episodes=1000 pygame 1.9.6 Hello from the pygame community. https://www.pygame.org/contribute.html INFO: Making new env: highway-v0 /root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), -1) will return an array of dtype('int64') format(shape, fill_value, array(fill_value).dtype), FutureWarning) /root/anaconda3/lib/python3.6/site-packages/numpy/core/numeric.py:301: FutureWarning: in the future, full((5, 5), 1) will return an array of dtype('int64') format(shape, fill_value, array(fill_value).dtype), FutureWarning) /root/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32 warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow')) [ERROR] Preferred device cuda:best unavailable, switching to default cpu INFO: Creating monitor directory out/HighwayEnv/DQNAgent/run_20200405-230154_7882 profiler execution failed ALSA lib confmisc.c:768:(parse_card) cannot find card '0' ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory ALSA lib confmisc.c:1251:(snd_func_refer) error evaluating name ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM default INFO: Starting new video recorder writing to /root/rl-agents/scripts/out/HighwayEnv/DQNAgent/run_20200405-230154_7882/openaigym.video.0.7882.video000000.mp4 Segmentation fault` When i was testing the env_medium with DQN , i got this fault. It should be noted that i was using the SSH to test it and the operated way is CPU instead of CUDA.Can u help me ?