"main.py" is not responding

Ezharjan commented 7 months ago

Every time when I run main.py, the vehicle will be driven and some time later it will hit the traffic object, then the pygame window will show "main.py" is not responding. The whole process ends. Are there any way to fix it?

The total output is as follows:

(carlagym) alex@fst-computer:~/Documents/222/RL-Carla-Gym/carla-driving-rl-agent$ python main.py 
pygame 2.5.2 (SDL 2.28.2, Python 3.7.16)
Hello from the pygame community. https://www.pygame.org/contribute.html
2024-04-03 21:49:54.468082: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/noetic/lib:/usr/local/cuda/lib64:
2024-04-03 21:49:54.468103: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From main.py:9: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2024-04-03 21:49:54.966199: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-03 21:49:55.002461: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2112000000 Hz
2024-04-03 21:49:55.003007: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1886b50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-04-03 21:49:55.003026: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2024-04-03 21:49:55.005304: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2024-04-03 21:49:55.008420: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2024-04-03 21:49:55.008444: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: fst-computer
2024-04-03 21:49:55.008447: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: fst-computer
2024-04-03 21:49:55.008516: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 535.54.3
2024-04-03 21:49:55.008527: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 535.54.3
2024-04-03 21:49:55.008530: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 535.54.3

/home/alex/anaconda3/envs/carlagym/lib/python3.7/site-packages/gym/spaces/box.py:78: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
Weather changed to WeatherParameters(cloudiness=5.000000, cloudiness=5.000000, precipitation=0.000000, precipitation_deposits=0.000000, wind_intensity=10.000000, sun_azimuth_angle=-1.000000, sun_altitude_angle=45.000000, fog_density=2.000000, fog_distance=0.750000, fog_falloff=0.100000, wetness=0.000000, scattering_intensity=1.000000, mie_scattering_scale=0.030000, rayleigh_scattering_scale=0.033100).
Random seed 51 set.
Random seed 695516342 set.
state_spec: {'state_image': (90, 360, 3), 'state_navigation': (5,), 'state_road': (9,), 'state_vehicle': (4,)}
action_shape: 2
distribution: beta
2024-04-03 21:49:59.251042: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Optimizer: adam.
Optimizer: adam.
Optimizer: adam.
Random seed 2362446593 set.
env.reset
No recommended values for 'speed' attribute
Event CARLAEvent.RESET triggered.
Skipped 30 frames.
Episode 1 terminated after 408 timesteps in 60.389s with reward 6244.197.
Random seed 1540690259 set.
Random seed 1054957361 set.
Random seed 3846014739 set.
Random seed 2653704632 set.

Env

Python Version: 3.7.16

Luca96 commented 7 months ago

Hi @Ezharjan , I think the "not responding" thing is due to the training being very slow. From the log, I can see that tensorflow doesn't use a GPU and this is extremely suggested! I mean to perform training you need a quite capable NVIDIA GPU (or ROCm for AMD) or on CPU it will take more than forever.

Ezharjan commented 7 months ago

Hi @Ezharjan , I think the "not responding" thing is due to the training being very slow. From the log, I can see that tensorflow doesn't use a GPU and this is extremely suggested! I mean to perform training you need a quite capable NVIDIA GPU (or ROCm for AMD) or on CPU it will take more than forever.

Thanks for your kind response!

I'm actually using a GPU3090 machine, with:

Fri Apr  5 19:59:09 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:01:00.0  On |                  N/A |
| 45%   66C    P2             194W / 350W |   2021MiB / 24576MiB |     44%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1521      G   /usr/lib/xorg/Xorg                           35MiB |
|    0   N/A  N/A      3030      G   /usr/lib/xorg/Xorg                          145MiB |
|    0   N/A  N/A      3173      G   /usr/bin/gnome-shell                         65MiB |
|    0   N/A  N/A     41987      G   ...erProcess --variations-seed-version       65MiB |
|    0   N/A  N/A    336765    C+G   ...aries/Linux/CarlaUE4-Linux-Shipping     1370MiB |
|    0   N/A  N/A    336919      C   python                                      254MiB |
+---------------------------------------------------------------------------------------+

Need I add some more codes to manually call main.py on GPU? (Or does the project requires to be run on a cluster via slurm rather than on a PC?)

Luca96 commented 7 months ago

@Ezharjan Try to comment this line, and try again. In the log you should now see that tensorflow is allocating memory for your GPU, if not you may have a broken installation of CUDA or cudnn.

Ezharjan commented 7 months ago

@Ezharjan Try to comment this line, and try again. In the log you should now see that tensorflow is allocating memory for your GPU, if not you may have a broken installation of CUDA or cudnn.

Thanks a lot!

Luca96 / carla-driving-rl-agent

"main.py" is not responding #29

Env