Closed Shmuma closed 5 years ago
The same happens with
PyTorch : 1.1.0
CUDA : 10.0.130
CUDNN : 7501
APEX : 0.1.0
GeForce GTX 1080 Ti : 1632.500 Mhz (Ordinal 0)
28 SMs enabled. Compute Capability sm_61
FreeMem: 10,687MB TotalMem: 11,178MB 64-bit pointers.
Mem Clock: 5505.000 Mhz x 352 bits (484.4 GB/s)
ECC Disabled
Looking into it - for the a2c case, can you try running it with --use-cuda-env --use-openai-test-env? This will do two things: 1) --use-cuda-env runs CuLE on the GPU, otherwise CuLE envs run on the CPU; 2) --use-openai-test-env will use openai (instead of CuLE CPU) for testing.
I got a similar problem:
$ python --use-cuda-env --use-openai-test-env
{'ale_start_steps': 400,
'alpha': 0.99,
'batch_size': 256,
'clip_epsilon': 0.1,
'conf_file': None,
'entropy_coef': 0.01,
'env_name': 'PongNoFrameskip-v4',
'episodic_life': False,
'eps': 1e-05,
'evaluation_episodes': 10,
'evaluation_interval': 1000000,
'gamma': 0.99,
'gpu': 0,
'local_rank': 0,
'log_dir': 'runs',
'loss_scale': None,
'lr': 0.00065,
'lr_scale': False,
'max_episode_length': 18000,
'max_grad_norm': 0.5,
'multiprocessing_distributed': False,
'no_cuda_train': True,
'normalize': False,
'num_ales': 16,
'num_gpus_per_node': -1,
'num_stack': 4,
'num_steps': 5,
'opt_level': 'O0',
'output_filename': None,
'plot': False,
'ppo_epoch': 3,
'profile': False,
'save_interval': 0,
'seed': 1565658549,
't_max': 50000000,
'tau': 1.0,
'use_adam': False,
'use_cuda_env': True,
'use_gae': False,
'use_openai': False,
'use_openai_test_env': True,
'value_loss_coef': 0.5,
'verbose': False}
PyTorch : 1.0.0
CUDA : 10.0.130
CUDNN : 7401
APEX : 0.1.0
GeForce GTX 1080 Ti : 0.000 Mhz (Ordinal 0)
131072 SMs enabled. Compute Capability sm_00
FreeMem: 11,019MB TotalMem: 11,178MB 64-bit pointers.
Mem Clock: 98.304 Mhz x 0 bits ( 0.0 GB/s)
ECC Enabled
GPUassert: invalid device symbol /home/lkh/Codes/cule/cule/atari/cuda/tables.hpp 43
Installed cule with python 3.7 and pytorch 1.1.0, cuda 10.0. Execution of hangs with the following messages:
No activity on cpu and gpu