EricR86 / segway-issues-proxy

0 stars 0 forks source link

segway train queuing job that never starts #30

Closed EricR86 closed 10 years ago

EricR86 commented 10 years ago

From apocalyp...@gmail.com on March 20, 2014 01:25:40

abhinav@abhnav:~$ qhost -F mem_requested

HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS

global - - - - - - - abhnav lx26-amd64 4 0.18 1.8G 913.0M 1.9G 301.5M Host Resource(s): hc:mem_requested=1.801G localhost

abhinav@abhnav:~$ segway --num-labels=4 train test.genomedata traindir4 traindir4/observations/chr21.0000.float32 (9411193, 9595548) PROGRAM ENDED SUCCESSFULLY WITH STATUS 0 AT Thursday March 20 2014, 10:46:02 IST queued 48: emt0.0.0.traindir4.ba32aef4afee11e3b8541803736f5e43 (mem_requested=2048M h_vmem=2048M h_stack=8M)

on interruption

^CTraceback (most recent call last): File "/home/abhinav/arch/Linux-x86_64/bin/segway", line 9, in load_entry_point('segway==1.1.0', 'console_scripts', 'segway')() File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3592, in main return runner() File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3429, in call self.run(_args, *_kwargs) File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3407, in run self.run_train() File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3038, in run_train instance_params = run_train_func(num_segs_range) File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3056, in run_train_singlethread res = [self.run_train_instance()] File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 2937, in run_train_instance self.run_train_round(instance_index, round_index, kwargs) File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 2900, in run_train_round restartable_jobs.wait() File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/cluster/init**.py", line 192, in wait sleep(MIN_JOB_WAIT_SLEEP_TIME) KeyboardInterrupt

qmon queue control->running tab (empty) listed in pending jobs

Original issue: http://code.google.com/p/segway-genome/issues/detail?id=30

EricR86 commented 10 years ago

From hoffman...@gmail.com on March 20, 2014 03:55:16

Are you able to get the standard tests in segway/test to work?

EricR86 commented 10 years ago

From apocalyp...@gmail.com on March 20, 2014 04:52:52

Same problem!

abhinav@abhnav:~/Downloads2/test$ bash ./test.sh traindir/observations/chr21.0000.float32 (9411193, 9595548) PROGRAM ENDED SUCCESSFULLY WITH STATUS 0 AT Thursday March 20 2014, 17:18:28 IST queued 49: emt0.0.0.traindir.8c9c2632b02511e39ca61803736f5e43 (mem_requested=2048M h_vmem=2048M h_stack=8M) ^CTraceback (most recent call last): File "/home/abhinav/arch/Linux-x86_64/bin/segway", line 9, in load_entry_point('segway==1.1.0', 'console_scripts', 'segway')() File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3592, in main return runner() File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3429, in call self.run(_args, *_kwargs) File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3407, in run self.run_train() File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3038, in run_train instance_params = run_train_func(num_segs_range) File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 3056, in run_train_singlethread res = [self.run_train_instance()] File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 2937, in run_train_instance self.run_train_round(instance_index, round_index, kwargs) File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/run.py", line 2900, in run_train_round restartable_jobs.wait() File "/home/abhinav/arch/Linux-x86_64/lib/python2.7/segway/cluster/init**.py", line 192, in wait sleep(MIN_JOB_WAIT_SLEEP_TIME) KeyboardInterrupt

EricR86 commented 10 years ago

From hoffman...@gmail.com on March 20, 2014 07:15:20

It looks like the job is queuing up just fine, what is happening to it? What are the results of qstat while your Segway job is running and you're waiting.

EricR86 commented 10 years ago

From apocalyp...@gmail.com on March 20, 2014 07:58:08

abhinav@abhnav:~/Downloads2/test$ bash test.sh traindir/observations/chr21.0000.float32 (9411193, 9595548) PROGRAM ENDED SUCCESSFULLY WITH STATUS 0 AT Thursday March 20 2014, 20:26:26 IST queued 54: emt0.0.0.traindir.cfa38e24b03f11e3a8591803736f5e43 (mem_requested=2048M h_vmem=2048M h_stack=8M)

abhinav@abhnav:~/Downloads2$ qstat job-ID prior name user state submit/start at

queue slots ja-task-ID

 54 0.50000 emt0.0.0.t abhinav      qw    03/20/2014 20:26:26
                          1

1

Working for small script containing "Hello World"

abhinav@abhnav:~$ qsub script.sh Your job 53 ("script.sh") has been submitted abhinav@abhnav:~$ ls | grep script script.sh script.sh.e53 script.sh.o53

EricR86 commented 10 years ago

From hoffman...@gmail.com on March 20, 2014 11:54:42

OK, "qw" means your job has been submitted but not started for some reason. What does qstat -j 53 get you? This should explain the reason.

EricR86 commented 10 years ago

From apocalyp...@gmail.com on March 20, 2014 14:12:57

scheduling info: (-l h_stack=8M,h_vmem=2048M,mem_requested=2048M) cannot run at host "abhnav" because it offers only hc:mem_requested=1.801G

EricR86 commented 10 years ago

From apocalyp...@gmail.com on March 21, 2014 05:46:02

Problem is basically I have 2GB RAM and machine is able to allocate 1.8 GB whereas Segway requires 2048 MB = 2GB RAM.

On Fri, Mar 21, 2014 at 2:42 AM, Abhinav Mittal apocalypse.mittal@gmail.com wrote:

EricR86 commented 10 years ago

From hoffman...@gmail.com on March 21, 2014 06:12:32

Try --mem-usage-progression=1,1.8. Then it will only allocate 1 GB, and 1.8 if that doesn't work. http://noble.gs.washington.edu/proj/segway/doc/1.1.0/segway.html#memory-usage I'm closing this since it isn't a bug. If you have further questions on this, please email segway-users@uw.edu. Thanks!

Summary: segway train queuing job that never starts (was: segway train not responding)
Status: Invalid