allenai / spoc-robot-training

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
https://spoc-robot.github.io/
Other
80 stars 6 forks source link

The issue about evaluate the model. #35

Closed yusirhhh closed 2 months ago

yusirhhh commented 2 months ago

Uploading results from 2 tasks out of 200 emitted. Process ForkServerProcess-3: Traceback (most recent call last): File "/home/mmyu/anaconda3/envs/spoc/lib/python3.8/site-packages/ai2thor/controller.py", line 1067, in step self.last_event = self.server.receive() File "/home/mmyu/anaconda3/envs/spoc/lib/python3.8/site-packages/ai2thor/fifo_server.py", line 229, in receive metadata, files = self._recv_message( File "/home/mmyu/anaconda3/envs/spoc/lib/python3.8/site-packages/ai2thor/fifo_server.py", line 145, in _recv_message header = self._read_with_timeout( File "/home/mmyu/anaconda3/envs/spoc/lib/python3.8/site-packages/ai2thor/fifo_server.py", line 134, in _read_with_timeout raise TimeoutError(f"Reading from AI2-THOR backend timed out (using {timeout}s) timeout.") TimeoutError: Reading from AI2-THOR backend timed out (using 1000s) timeout.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/mmyu/anaconda3/envs/spoc/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/mmyu/anaconda3/envs/spoc/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/mnt/data/mmyu/objectNav/spoc-robot-training/online_evaluation/online_evaluator_worker.py", line 57, in start_worker worker.distribute_evaluate(agent, tasks_queue, results_queue) File "/mnt/data/mmyu/objectNav/spoc-robot-training/online_evaluation/online_evaluator_worker.py", line 503, in distribute_evaluate sample_result = self.evaluate_on_task(task=task, agent=agent, worker_id=self.worker_id) File "/mnt/data/mmyu/objectNav/spoc-robot-training/online_evaluation/online_evaluator_worker.py", line 322, in evaluate_on_task top_down_frame = get_top_down_frame( File "/mnt/data/mmyu/objectNav/spoc-robot-training/utils/visualization_utils.py", line 226, in get_top_down_frame top_down = controller.get_top_down_path_view(agent_path, target_ids) File "/mnt/data/mmyu/objectNav/spoc-robot-training/environment/stretch_controller.py", line 201, in get_top_down_path_view self.controller.step({"action": "AddThirdPartyCamera", "skyboxColor": "white", cam}) File "/home/mmyu/anaconda3/envs/spoc/lib/python3.8/site-packages/ai2thor/controller.py", line 1092, in step raise (TimeoutError if isinstance(e, TimeoutError) else RuntimeError)( TimeoutError: Error encountered when running action {'action': 'AddThirdPartyCamera', 'skyboxColor': 'white', 'position': {'x': 8.147500038146973, 'y': 3.3163037300109863, 'z': 7.129000186920166}, 'rotation': {'x': 90.0, 'y': 0.0, 'z': 0.0}, 'orthographicSize': 9.147500038146973, 'orthographic': True, 'sequenceId': 40900} in scene Procedural. Rate 0.120 tasks/s, ETA 0.005hours Joined worker index 2 No process alive. Terminating Evaluation finished

Aggregated results +---------------+--------------------+--------------------+-------------------+--------------------------+---------------------+--------------------+------------+-------------+ | task_type | eps_len | success | eps_len_succ | percentage_rooms_visited | total_rooms_visited | eps_len_fail | total_size | num_workers | +---------------+--------------------+--------------------+-------------------+--------------------------+---------------------+--------------------+------------+-------------+ | ObjectNavType | 130.54040404033813 | 0.8383838483834147 | 98.89156626500068 | 0.5951839824637568 | 2.4191919191906974 | 294.71874999907897 | 198 | 10 | +---------------+--------------------+--------------------+-------------------+--------------------------+---------------------+--------------------+------------+-------------+ Uploading results from 2 tasks out of 200 emitted.

ehsanik commented 2 months ago

Hi.

This is an issue that happens when Unity crashes. It seems like 2 out of 200 samples were not finished. I recommend re-running the experiment with fewer number of threads and on a less overloaded server.