./local_evaluation.sh - Githubissues

yanzhaohui1124 commented 2 years ago

when i use _./localevaluation.sh,it stops automatically. `89: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if LooseVersion(dist.version) < LooseVersion('0.9.10'):

Registering the global statistics` and then it stops.

Kait0 commented 2 years ago

These warnings are normal. Your problem is probably that you have already completed all the routes. Try setting: RESUME=0 in local_evaluation.sh See also here.

yanzhaohui1124 commented 2 years ago

i try again,but it doesn't work. in transfuserlongestest6.json,it is like this: { "_checkpoint": { "global_record": { "index": -1, "infractions": { "collisions_layout": 0.0, "collisions_pedestrian": 0.0, "collisions_vehicle": 0.0, "outside_route_lanes": 0.0, "red_light": 0.0, "route_dev": 0.0, "route_timeout": 0.0, "stop_infraction": 0.0, "vehicle_blocked": 0.0 }, "meta": { "exceptions": [ [ "RouteScenario_0", 0, "Failed - Agent couldn't be set up" ], [ "RouteScenario_1", 1, "Failed - Agent couldn't be set up" ], [ "RouteScenario2", 2, "Failed - Agent couldn't be set up" ],

Kait0 commented 2 years ago

This means your agent crashed when it got executed. What is the stacktrace that get's printed out on the console?

yanzhaohui1124 commented 2 years ago

Setting up the agent /home/zhaohui/transfuser/model_ckpt/transfuser/model_seed1_39.pth Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/regnety_032_ra-7f2439f9.pth" to /home/zhaohui/.cache/torch/hub/checkpoints/regnety_032_ra-7f2439f9.pth Watchdog exception - Timeout of 61.0 seconds occured

Could not set up the required agent:

Timeout: Agent took too long to setup

Traceback (most recent call last): File "/home/zhaohui/transfuser/leaderboard/leaderboard/leaderboard_evaluator_local.py", line 271, in _load_and_run_scenario self.agent_instance = getattr(self.module_agent, agent_class_name)(args.agent_config) File "/home/zhaohui/transfuser/leaderboard/leaderboard/autoagents/autonomous_agent.py", line 45, in init self.setup(path_to_conf_file) File "/home/zhaohui/transfuser/team_code_transfuser/submission_agent.py", line 91, in setup net = LidarCenterNet(self.config, 'cuda', self.backbone, image_architecture, lidar_architecture, use_velocity) File "/home/zhaohui/transfuser/team_code_transfuser/model.py", line 564, in init self._model = TransfuserBackbone(config, image_architecture, lidar_architecture, use_velocity=use_velocity).to(self.device) File "/home/zhaohui/transfuser/team_code_transfuser/transfuser.py", line 22, in init self.image_encoder = ImageCNN(architecture=image_architecture, normalize=True) File "/home/zhaohui/transfuser/team_code_transfuser/transfuser.py", line 378, in init self.features = timm.create_model(architecture, pretrained=True) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/site-packages/timm/models/factory.py", line 74, in create_model model = create_fn(pretrained=pretrained, kwargs) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/site-packages/timm/models/regnet.py", line 458, in regnety_032 return _create_regnet('regnety_032', pretrained, kwargs) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/site-packages/timm/models/regnet.py", line 350, in _create_regnet *kwargs) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/site-packages/timm/models/helpers.py", line 470, in build_model_with_cfg strict=pretrained_strict) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/site-packages/timm/models/helpers.py", line 189, in load_pretrained state_dict = load_state_dict_from_url(pretrained_url, progress=progress, map_location='cpu') File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/site-packages/torch/hub.py", line 591, in load_state_dict_from_url download_url_to_file(url, cached_file, hash_prefix, progress=progress) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/site-packages/torch/hub.py", line 457, in download_url_to_file u = urlopen(req) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/urllib/request.py", line 525, in open response = self._open(req, data) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/urllib/request.py", line 543, in _open '_open', req) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(args) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/urllib/request.py", line 1393, in https_open context=self._context, check_hostname=self._check_hostname) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/urllib/request.py", line 1353, in do_open r = h.getresponse() File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/http/client.py", line 1373, in getresponse response.begin() File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/http/client.py", line 319, in begin version, status, reason = self._read_status() File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/http/client.py", line 280, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/socket.py", line 589, in readinto return self._sock.recv_into(b) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/ssl.py", line 1071, in recv_into return self.read(nbytes, buffer) File "/home/zhaohui/anaconda3/envs/tfuse/lib/python3.7/ssl.py", line 929, in read return self._sslobj.read(len, buffer) File "/home/zhaohui/transfuser/leaderboard/leaderboard/leaderboard_evaluator_local.py", line 113, in _signal_handler raise RuntimeError("Timeout: Agent took too long to setup") RuntimeError: Timeout: Agent took too long to setup

Registering the route statistics

Kait0 commented 2 years ago

The TIMM library that we use for the backbone downloads default pre-trained weights when you run it for the first time. Your code seems to time out while trying to download these weights, see this line in the stacktrace. state_dict = load_state_dict_from_url(pretrained_url, progress=progress, map_location='cpu')

Does your computer have a stable internet connection that can access github?

yanzhaohui1124 commented 2 years ago

thank you, i will try again

yanzhaohui1124 commented 2 years ago

thank you,i have solved this problem.

yanzhaohui1124 commented 2 years ago

my car runs too slow, is this connective with gpu/cpu?

Kait0 commented 2 years ago

I don't understand that question. The models all run on the GPU. What does too slow mean? If you are running the models and CARLA on the same GPU than it's certainly possible that the simulation will not be very fast depending on what hardware you have.

yanzhaohui1124 commented 2 years ago

we annotate all the codes in 'run_step' and replace the 'control' with do-nothing instruction,after that screen become fluent.Is this because our execution speed is limited by hardware?

Kait0 commented 2 years ago

By default CARLA runs in synchronous mode. In this mode when it is asking the agent to provide a control for the ego_vehicle it will stop the simulation until a control is returned. Since models take some time to process the sensors this will look like a lag to a human observer if the time for running a simulation step + running the run_step is > 50ms (which it is for TransFuser). (Images appear fluent for humans when they are rendered faster then roughly 20 Frames per second). CARLA also has the option to run asynchronous but CARLA leaderboard methods don't use this feature as it makes things more complicated.

yanzhaohui1124 commented 2 years ago

It seemed that the models are running on CPU so it's so slow.How can I make them run on GPU?

Kait0 commented 2 years ago

How do you come to the conclusion that the models are running on the CPU? All the model are sent to the GPU after creation here All the data is also sent to cuda, there is no cpu option.

yanzhaohui1124 commented 2 years ago

When I run the local_evaluation.sh, my carla's FPS is only 3 (with RTX 2080Ti, Memory Usage 6626Mb, GPU-util 22%). Any suggestion on making the program run faster?

Kait0 commented 2 years ago

I get around 7 FPS (5-10) on average with a similar system (RTX 2080 Ti local machine) so this is normal. Not sure what you need the speedup for but you could only evaluate a single model instead of an ensemble. You could also have a look at Torch Jit to speed up your code, but I have no experience with that.

Running CARLA simulations is computationally expensive, I have no simple solution.

Kait0 commented 2 years ago

For longest6 evaluations we speed up the evaluation by parallelizing across the routes, running each of them on it's own GPU instance on a compute cluster.

yanzhaohui1124 commented 2 years ago

thank you

JianLiMech commented 1 year ago

thank you,i have solved this problem.

Thank you for your information, I met the same problem here. Screenshot from 2023-01-23 14-52-11

Could you tell me how can we solve this problem? Thank you in advance!

tijaz17skane commented 4 months ago

@JianLiMech I fixed that by going into ./transfuser/results and deleting the trasnfuser_longest6.json file. This error means that you've run all the routes, you need to delete that file to start over.

autonomousvision / transfuser

./local_evaluation.sh #103