Open majidaldo opened 9 years ago
I think I may be having this issue as well. Here's my spearmint invocation and output, followed by a top snapshot indicating program behavior.
(SPEARMINT_ENV)dscott@rclogin11:/n/moorcroftfs4/dscott/spearmint/spearmint=>mongod --fork --logpath /n/moorcroftfs4/dscott/runfiles/smnt/mdbLog --dbpath /n/moorcroftfs4/dscott/runfiles/smnt/mdb/
about to fork child process, waiting until server is ready for connections.
forked process: 1581
child process started successfully, parent exiting
(SPEARMINT_ENV)dscott@rclogin11:/n/moorcroftfs4/dscott/spearmint/spearmint=>python main.py /n/moorcroftfs4/dscott/runfiles/smnt/
Using database at localhost.
Getting suggestion...
Suggestion: NAME TYPE VALUE
---- ---- -----
y float 3.000000
x float 0.000000
Submitted job 1 with local scheduler (process id: 25228).
Status: 1 pending, 0 complete.
Getting suggestion...
Suggestion: NAME TYPE VALUE
---- ---- -----
y float 5.000000
x float 2.500000
Submitted job 2 with local scheduler (process id: 19890).
Status: 1 pending, 1 complete.
Fitting GP for job_wrap task...
Getting suggestion...
Minimum expected objective value under model is 3.00000 (+/- 0.00318), at location:
NAME TYPE VALUE
---- ---- -----
y float 3.000000
x float 0.000000
Minimum of observed values is 3.000000, at location:
NAME TYPE VALUE
---- ---- -----
y float 3.000000
x float 0.000000
Suggestion: NAME TYPE VALUE
---- ---- -----
y float 7.000000
x float 0.063445
Submitted job 3 with local scheduler (process id: 18234).
Status: 1 pending, 2 complete.
From here, the job has been sitting for about 40 min. Here's top output:
1 top - 16:25:32 up 13 days, 5:45, 34 users, load average: 3.54, 3.70, 3.41
2 Tasks: 495 total, 2 running, 490 sleeping, 2 stopped, 1 zombie
3 Cpu(s): 13.1%us, 27.7%sy, 0.0%ni, 57.3%id, 1.3%wa, 0.0%hi, 0.6%si, 0.0%st
4 Mem: 16329656k total, 16027892k used, 301764k free, 111792k buffers
5 Swap: 8388600k total, 1031044k used, 7357556k free, 13383300k cached
6
7 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8 5226 dscott 20 0 26188 1816 1200 R 5.7 0.0 0:00.03 top
9 1581 dscott 20 0 523m 10m 4772 S 0.0 0.1 0:15.75 mongod
10 1596 dscott 20 0 2648m 185m 1356 T 0.0 1.2 0:09.00 MATLAB
11 3472 dscott 20 0 92704 2112 1272 S 0.0 0.0 0:00.55 sshd
12 4340 dscott 20 0 116m 2108 1532 S 0.0 0.0 0:00.15 bash
13 7962 dscott 20 0 92704 2120 1272 S 0.0 0.0 0:00.42 sshd
14 8534 dscott 20 0 578m 69m 9720 S 0.0 0.4 0:30.63 python
15 9131 dscott 20 0 116m 2248 1552 S 0.0 0.0 0:00.33 bash
16 18234 dscott 20 0 0 0 0 Z 0.0 0.0 0:00.55 python <defunct>
17
Any help would be much appreciated. I just killed the job and notice it's sleeping:
^CTraceback (most recent call last):
File "main.py", line 494, in <module>
main()
File "main.py", line 309, in main
time.sleep(options.get('polling-time', 5))
I'd be happy to debug some of this stuff myself but am also wondering, are there debugging or verbosity flags one can pass to spearmint or turn on in the code? Or is the only option breaking and inspecting?
Also, just to register it, thanks Spearmint folks for putting this online, writing the papers, doing the thesis etc. I'm in academia as well and get that a) there isn't always time to do every last thing that might be warranted re a code project, and b) sometimes things don't work for others even when they work fine for you.
:+1:
i think this is reasonable behavior to expect from the program. (it didn't exit when i resumed my runs when they reached max-finished-jobs)