askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
360 stars 77 forks source link

Code hanging when run from headless server #66

Closed soyeonm closed 3 years ago

soyeonm commented 3 years ago

Hello,

When I try to run evaluation code on a headless server, the code does not break but it just hangs and never proceeds. More specifically, when I run

(base) $ python models/eval/eval_seq2seq.py --model_path models/pre-trained_models/best_seen.pth --eval_split valid_seen --data data/json_feat_2.1.0 --model models.model.seq2seq_im_mask --gpu --num_threads 1

, I get:

{'tests_seen': 1533, 'tests_unseen': 1529, 'train': 21023, 'valid_seen': 820, 'valid_unseen': 821} Loading: models/pre-trained_models/best_seen.pth

and then nothing happens/ prints even if I leave the code for an hour. (I have python $ALFRED_ROOT/scripts/startx.py 1 (or whatever display number) running in a tmux session.)

When I press ctrl + c, I get,

Traceback (most recent call last): File "models/eval/eval_seq2seq.py", line 57, in eval.spawn_threads() File "/raid/soyeonm/alfred/models/eval/eval.py", line 94, in spawn_threads t.join() File "/usr0/soyeonm/anaconda3/lib/python3.8/multiprocessing/process.py", line 149, in join res = self._popen.wait(timeout) File "/usr0/soyeonm/anaconda3/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait return self.poll(os.WNOHANG if timeout == 0.0 else 0) File "/usr0/soyeonm/anaconda3/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt Process Process-2: Traceback (most recent call last): File "/usr0/soyeonm/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr0/soyeonm/anaconda3/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/raid/soyeonm/alfred/models/eval/eval_task.py", line 20, in run env = ThorEnv() File "/usr0/soyeonm/raid/alfred/env/thor_env.py", line 29, in init super().init(quality=quality) File "/usr0/soyeonm/anaconda3/lib/python3.8/site-packages/ai2thor/controller.py", line 440, in init self.start( File "/usr0/soyeonm/anaconda3/lib/python3.8/site-packages/ai2thor/controller.py", line 1123, in start self.check_x_display(env["DISPLAY"]) File "/usr0/soyeonm/anaconda3/lib/python3.8/site-packages/ai2thor/controller.py", line 929, in check_x_display subprocess.call("xdpyinfo", stdout=dn, env=env, shell=True) == 0 File "/usr0/soyeonm/anaconda3/lib/python3.8/subprocess.py", line 342, in call return p.wait(timeout=timeout) File "/usr0/soyeonm/anaconda3/lib/python3.8/subprocess.py", line 1079, in wait return self._wait(timeout=timeout) File "/usr0/soyeonm/anaconda3/lib/python3.8/subprocess.py", line 1804, in _wait (pid, sts) = self._try_wait(0) File "/usr0/soyeonm/anaconda3/lib/python3.8/subprocess.py", line 1762, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt

Could you please help me with this issue? Do you have any idea where this is coming from? Thanks!

MohitShridhar commented 3 years ago

@soyeonm, following the headless setup guide, have you set the right X_DISPLAY here: https://github.com/askforalfred/alfred/blob/master/gen/constants.py#L88?

Can you do the check_thor.py test before evaluation to see if everything is working?

soyeonm commented 3 years ago

Thanks for your reply, @MohitShridhar . Yes, I ran $ALFRED_ROOT/scripts/startx.py 2 set X_DISPLAY = '2' in gen/constants.py, and set export DISPLAY=:2 in the command line.

check_thor.py also hangs. When I press ctrl+c while running check_thor.py, I get

Traceback (most recent call last): File "scripts/check_thor.py", line 3, in c = Controller() File "/usr0/soyeonm/anaconda3/lib/python3.8/site-packages/ai2thor/controller.py", line 440, in init self.start( File "/usr0/soyeonm/anaconda3/lib/python3.8/site-packages/ai2thor/controller.py", line 1123, in start self.check_x_display(env["DISPLAY"]) File "/usr0/soyeonm/anaconda3/lib/python3.8/site-packages/ai2thor/controller.py", line 929, in check_x_display subprocess.call("xdpyinfo", stdout=dn, env=env, shell=True) == 0 File "/usr0/soyeonm/anaconda3/lib/python3.8/subprocess.py", line 342, in call return p.wait(timeout=timeout) File "/usr0/soyeonm/anaconda3/lib/python3.8/subprocess.py", line 1079, in wait return self._wait(timeout=timeout) File "/usr0/soyeonm/anaconda3/lib/python3.8/subprocess.py", line 1804, in _wait (pid, sts) = self._try_wait(0) File "/usr0/soyeonm/anaconda3/lib/python3.8/subprocess.py", line 1762, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt

MohitShridhar commented 3 years ago

@soyeonm can you post the output of python startx.py 2?

This still feels like something to do with setting the correct DISPLAY id. Can you also try different ids?

MohitShridhar commented 3 years ago

Closing due to inactivity.