dotchen / LearningByCheating

(CoRL 2019) Driving in CARLA using waypoint prediction and two-stage imitation learning
MIT License
301 stars 103 forks source link

Unable to benchmark pre-trained image model: freezing on import #7

Open rohanb2018 opened 4 years ago

rohanb2018 commented 4 years ago

Hello, thanks for providing the code for your paper!

I have been unable to run either the benchmark_agent.py code (with the pre-trained model-10.th), or the data_collector.py script. Both scripts seem to be freezing at some point in the import process, which I haven't been able to resolve. They don't throw errors (or print anything besides the pygame message) but seem to hang indefinitely.

I had to make a couple of modifications to the installed dependencies, because it seems that the RTX GPU family is not compatible with CUDA 9.0 or below (or with any packages that are built with CUDA 9.0 or below), as per https://github.com/pytorch/pytorch/issues/17543, and because I ran into CUDA warnings and CUDNN errors when using the original dependencies (that were built with CUDA 8.0)

The relevant versions of dependencies on my system are as follows:

Package Version
cudatoolkit 10.1.243
pytorch py3.5_cuda10.1.243_cudnn7.6.3_0
cudnn 7.6.5

If anyone has guidance on how to debug/resolve this issue, I'd really appreciate it. Thanks so much!


My set-up: GPU: GeForce RTX 2080 with Max-Q CUDA version: 10.1

raks097 commented 4 years ago

Hello, thanks for providing the code for your paper!

I have been unable to run either the benchmark_agent.py code (with the pre-trained model-10.th), or the data_collector.py script. Both scripts seem to be freezing at some point in the import process, which I haven't been able to resolve. They don't throw errors (or print anything besides the pygame message) but seem to hang indefinitely.

I had to make a couple of modifications to the installed dependencies, because it seems that the RTX GPU family is not compatible with CUDA 9.0 or below (or with any packages that are built with CUDA 9.0 or below), as per pytorch/pytorch#17543, and because I ran into CUDA warnings and CUDNN errors when using the original dependencies (that were built with CUDA 8.0)

The relevant versions of dependencies on my system are as follows:

Package Version cudatoolkit 10.1.243 pytorch py3.5_cuda10.1.243_cudnn7.6.3_0 cudnn 7.6.5 If anyone has guidance on how to debug/resolve this issue, I'd really appreciate it. Thanks so much!

My set-up: GPU: GeForce RTX 2080 with Max-Q CUDA version: 10.1

Hey @rohanb2018 , You are right about the incompatibility of CUDA 8.0 and RTX 2080.

Downgrading my dependencies to the following versions fixed the problem while I was testing it out.

cudatoolkit 10.0.130
cudnn 7.6.0
pytorch_1.0.0 py3.5_cuda10.0.130_cudnn7.4.1_1

Hope this helps

dotchen commented 4 years ago

Thanks for your interest in our project!

If data_collector.py hangs it probably suggests issues other than pytorch installations (not 100% though). Could you paste the messages you got? Also, make sure you installed our .egg file and that the portal number matches the CARLA instance you launched, and try changing the order in which carla/pytorch related stuff is imported.

rohanb2018 commented 4 years ago

Thanks for the response, guys!

@raks097: Thanks for the suggestion, unfortunately when trying to conda install the suggested PyTorch version you suggested, I ran into a conda UnsatisfiableError with a long list of incompatible specifications. It's weird because all of the dependencies (and their compatible versions) for that version of PyTorch seem to be present in the environment, but conda still complains.

@dianchen96: Sure! The only console output I get from running either the agent benchmark script or the data collector script is just the pygame message:

pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html

In both cases I have to kill the running Python process because it doesn't respond to the usual keyboard interrupt.

I did confirm that I have your CARLA .egg file - just to make sure, I re-downloaded and installed it and the freezing issue persists. Also checked the port number and it seems fine.

I'll try playing around with the import order and see if that helps.

bradyz commented 4 years ago

@rohanb2018

could be related to https://github.com/carla-simulator/carla/issues/2132

rohanb2018 commented 4 years ago

@rohanb2018

could be related to carla-simulator/carla#2132

Yeah this might be it! I tried import carla followed by import torchvision and it freezes up, also requiring me to kill the process like I had to do for the agent benchmark code. I guess I have to move the torchvision import to be earlier than the first instance of wherever carla is imported - will try it and see.

bradyz commented 4 years ago

one way I got around this previously was just to remove the torchvision import, since torchvision is only used for the ToTensor transform, and replace all the ToTensor with the following

myToTensor = lambda x: (torch.FloatTensor(x) / 255.0).transpose(0, 1).transpose(0, 2).contiguous()

which converts a numpy uint8 to a FloatTensor (taken from https://pytorch.org/docs/stable/_modules/torchvision/transforms/functional.html#to_tensor)

rohanb2018 commented 4 years ago

Great, I think I actually got the benchmark_agent.py example working now!

I ended up commenting out all of the torchvision imports inside the bird_view folder. This included replacing all of the ToTensor calls with the code that you mentioned. Additionally, I had to comment out instances of torchvision.utils inside the logger.py and saver.py files. I replaced the calls to tv_utils.make_grid in both of those files by just copying the source code from PyTorch (https://pytorch.org/docs/stable/_modules/torchvision/utils.html#make_grid).

I assume I'll have to do the same to the files inside the training folder since I see some references to torchvision there as well.

Thanks again for your help! Will let you know if I run into any other issues.

bradyz commented 4 years ago

note - another easier way to get around this is to just find the first instance of

import carla

and simply add import torchvision right before that

many libraries have this problem due to the way pytorch is compiled, see https://github.com/pytorch/pytorch/issues/19739#issuecomment-524675014

peiyunh commented 4 years ago

Hi @bradyz , thanks for sharing the code! When I tried to run benchmark_agent.py, it freezes on import torchvision inside bird_view/utils/carla_utils.py. After I comment out the line, the code is able to run until it again freezes on import torch in bird_view/utils/bz_utils/saver.py. Do you have any insight on how to fix this?

Besides, according to https://github.com/carla-simulator/carla/issues/2132#issuecomment-649934258, this issue might have been fixed with CARLA 0.9.9. Is the latest version by any chance a viable option?

I am testing the latest version (031308).

Update: benchmark_agent.py no longer freezes with CARLA 0.9.9.4 but will run into the following errors:

pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
suite: FullTown02-v1
before run_benchmark
  0%|                                                                                                                         | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "benchmark_agent.py", line 69, in run
    run_benchmark(agent_maker, env, benchmark_dir, seed, autopilot, resume, max_run=max_run, show=show)
  File "/home/peiyunh/code/lbc/benchmark/run_benchmark.py", line 243, in run_benchmark
    result, diagnostics = run_single(env, weather, start, target, agent_maker, seed, autopilot, show=show)
  File "/home/peiyunh/code/lbc/benchmark/run_benchmark.py", line 174, in run_single
    env.init(start=start, target=target, weather=cu.PRESET_WEATHERS[weather])
  File "/home/peiyunh/code/lbc/benchmark/goal_suite.py", line 44, in init
    super().init(**kwargs)
  File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 497, in init
    self.spawn_player()
  File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 528, in spawn_player
    self._player.start_dtcrowd()
AttributeError: 'Vehicle' object has no attribute 'start_dtcrowd'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "benchmark_agent.py", line 94, in <module>
    run(Path(args.model_path), args.port, args.suite, args.big_cam, args.seed, args.autopilot, args.resume, max_run=args.max_run, show=args.show)
  File "benchmark_agent.py", line 69, in run
    run_benchmark(agent_maker, env, benchmark_dir, seed, autopilot, resume, max_run=max_run, show=show)
  File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 737, in __exit__
    self.clean_up()
  File "/home/peiyunh/code/lbc/benchmark/goal_suite.py", line 86, in clean_up
    super().clean_up()
  File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 626, in clean_up
    self._player.stop_dtcrowd()
AttributeError: 'Vehicle' object has no attribute 'stop_dtcrowd'
bradyz commented 4 years ago

At the very very top of benchmark_agent can you try

import torch import torchvision

dotchen commented 4 years ago

@peiyunh start_dtcrowd/stop_dtcrowd only come with our custom CARLA 0.9.6 egg for the pedestrian fix. If you would like to use this repo with CARLA 0.9.9 you need to modify some of the utilities code.

peiyunh commented 4 years ago

Thanks for the replies @bradyz @dianchen96 !

At the very very top of benchmark_agent can you try

import torch import torchvision

This works! I am able to run benchmark_agent now. Thanks a lot!

HFY123 commented 3 years ago

Thanks for the replies @bradyz @dianchen96 !

At the very very top of benchmark_agent can you try import torch import torchvision

This works! I am able to run benchmark_agent now. Thanks a lot!

Hi, I met the same problem. How did you fix it? I add import torch import torchvision at the first line of benchmark_agent.py but the problem still exists. Thanks!

xolovezari commented 1 month ago

At the very very top of benchmark_agent can you try

import torch import torchvision

From all the quotes here, this was the best one I tried it even with coda 12.5 at it really worked! Thanks