HIPS / Spearmint

Spearmint Bayesian optimization codebase
Other
1.55k stars 329 forks source link

should exit when max-finished-jobs reached #43

Open majidaldo opened 9 years ago

majidaldo commented 9 years ago

i think this is reasonable behavior to expect from the program. (it didn't exit when i resumed my runs when they reached max-finished-jobs)

DanielNScott commented 8 years ago

I think I may be having this issue as well. Here's my spearmint invocation and output, followed by a top snapshot indicating program behavior.

(SPEARMINT_ENV)dscott@rclogin11:/n/moorcroftfs4/dscott/spearmint/spearmint=>mongod --fork --logpath /n/moorcroftfs4/dscott/runfiles/smnt/mdbLog --dbpath /n/moorcroftfs4/dscott/runfiles/smnt/mdb/
about to fork child process, waiting until server is ready for connections.
forked process: 1581
child process started successfully, parent exiting
(SPEARMINT_ENV)dscott@rclogin11:/n/moorcroftfs4/dscott/spearmint/spearmint=>python main.py /n/moorcroftfs4/dscott/runfiles/smnt/
Using database at localhost.
Getting suggestion...

Suggestion:     NAME          TYPE       VALUE
                ----          ----       -----
                y             float      3.000000
                x             float      0.000000
Submitted job 1 with local scheduler (process id: 25228).
Status: 1 pending, 0 complete.

Getting suggestion...

Suggestion:     NAME          TYPE       VALUE
                ----          ----       -----
                y             float      5.000000
                x             float      2.500000
Submitted job 2 with local scheduler (process id: 19890).
Status: 1 pending, 1 complete.

Fitting GP for job_wrap task...
Getting suggestion...

Minimum expected objective value under model is 3.00000 (+/- 0.00318), at location:
                NAME          TYPE       VALUE
                ----          ----       -----
                y             float      3.000000
                x             float      0.000000

Minimum of observed values is 3.000000, at location:
                NAME          TYPE       VALUE
                ----          ----       -----
                y             float      3.000000
                x             float      0.000000

Suggestion:     NAME          TYPE       VALUE
                ----          ----       -----
                y             float      7.000000
                x             float      0.063445
Submitted job 3 with local scheduler (process id: 18234).
Status: 1 pending, 2 complete.

From here, the job has been sitting for about 40 min. Here's top output:

  1 top - 16:25:32 up 13 days,  5:45, 34 users,  load average: 3.54, 3.70, 3.41
  2 Tasks: 495 total,   2 running, 490 sleeping,   2 stopped,   1 zombie
  3 Cpu(s): 13.1%us, 27.7%sy,  0.0%ni, 57.3%id,  1.3%wa,  0.0%hi,  0.6%si,  0.0%st
  4 Mem:  16329656k total, 16027892k used,   301764k free,   111792k buffers
  5 Swap:  8388600k total,  1031044k used,  7357556k free, 13383300k cached
  6
  7   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  8  5226 dscott    20   0 26188 1816 1200 R  5.7  0.0   0:00.03 top
  9  1581 dscott    20   0  523m  10m 4772 S  0.0  0.1   0:15.75 mongod
 10  1596 dscott    20   0 2648m 185m 1356 T  0.0  1.2   0:09.00 MATLAB
 11  3472 dscott    20   0 92704 2112 1272 S  0.0  0.0   0:00.55 sshd
 12  4340 dscott    20   0  116m 2108 1532 S  0.0  0.0   0:00.15 bash
 13  7962 dscott    20   0 92704 2120 1272 S  0.0  0.0   0:00.42 sshd
 14  8534 dscott    20   0  578m  69m 9720 S  0.0  0.4   0:30.63 python
 15  9131 dscott    20   0  116m 2248 1552 S  0.0  0.0   0:00.33 bash
 16 18234 dscott    20   0     0    0    0 Z  0.0  0.0   0:00.55 python <defunct>
 17

Any help would be much appreciated. I just killed the job and notice it's sleeping:

^CTraceback (most recent call last):
  File "main.py", line 494, in <module>
    main()
  File "main.py", line 309, in main
    time.sleep(options.get('polling-time', 5))

I'd be happy to debug some of this stuff myself but am also wondering, are there debugging or verbosity flags one can pass to spearmint or turn on in the code? Or is the only option breaking and inspecting?

Also, just to register it, thanks Spearmint folks for putting this online, writing the papers, doing the thesis etc. I'm in academia as well and get that a) there isn't always time to do every last thing that might be warranted re a code project, and b) sometimes things don't work for others even when they work fine for you.

sjackman commented 7 years ago

:+1: