Closed wingmanzz closed 3 weeks ago
If you would like to parallelize I recommend that you create separate GEKKO instances by calling m=GEKKO(remote=False)
for each instance. One advantage of using a prior directory is that GEKKO attempts to restart from the prior solution and there can be some speed improvements from the Warm start.
You can clean up the local files by using m.cleanup()
after the instance has completed. I recommend the clean up so that you don't fill up your computer hard drive with temp files for very large jobs. For small jobs, this isn't a problem because the temp files are regularly removed with Disk Cleanup. However, if you are running many millions of parallel jobs, this could be an issue.
Here is another example of multi-threading with Python:
import numpy as np
import threading
import time, random
from gekko import GEKKO
class ThreadClass(threading.Thread):
def __init__(self, id, server, ai, bi):
s = self
s.id = id
s.server = server
s.m = GEKKO()
s.a = ai
s.b = bi
s.objective = float('NaN')
# initialize variables
s.m.x1 = s.m.Var(1,lb=1,ub=5)
s.m.x2 = s.m.Var(5,lb=1,ub=5)
s.m.x3 = s.m.Var(5,lb=1,ub=5)
s.m.x4 = s.m.Var(1,lb=1,ub=5)
# Equations
s.m.Equation(s.m.x1*s.m.x2*s.m.x3*s.m.x4>=s.a)
s.m.Equation(s.m.x1**2+s.m.x2**2+s.m.x3**2+s.m.x4**2==s.b)
# Objective
s.m.Obj(s.m.x1*s.m.x4*(s.m.x1+s.m.x2+s.m.x3)+s.m.x3)
# Set global options
s.m.options.IMODE = 3 # steady state optimization
s.m.options.SOLVER = 1 # APOPT solver
threading.Thread.__init__(s)
def run(self):
# Don't overload server by executing all scripts at once
sleep_time = random.random()
time.sleep(sleep_time)
print('Running application ' + str(self.id) + '\n')
# Solve
self.m.solve(disp=False)
# Results
#print('')
#print('Results')
#print('x1: ' + str(self.m.x1.value))
#print('x2: ' + str(self.m.x2.value))
#print('x3: ' + str(self.m.x3.value))
#print('x4: ' + str(self.m.x4.value))
# Retrieve objective if successful
if (self.m.options.APPSTATUS==1):
self.objective = self.m.options.objfcnval
else:
self.objective = float('NaN')
self.m.cleanup()
# Select server
server = 'https://byu.apmonitor.com'
# Optimize at mesh points
x = np.arange(20.0, 30.0, 2.0)
y = np.arange(30.0, 50.0, 2.0)
a, b = np.meshgrid(x, y)
# Array of threads
threads = []
# Calculate objective at all meshgrid points
# Load applications
id = 0
for i in range(a.shape[0]):
for j in range(b.shape[1]):
# Create new thread
threads.append(ThreadClass(id, server, a[i,j], b[i,j]))
# Increment ID
id += 1
# Run applications simultaneously as multiple threads
# Max number of threads to run at once
max_threads = 8
for t in threads:
while (threading.activeCount()>max_threads):
# check for additional threads every 0.01 sec
time.sleep(0.01)
# start the thread
t.start()
# Check for completion
mt = 3.0 # max time
it = 0.0 # incrementing time
st = 1.0 # sleep time
while (threading.activeCount()>=1):
time.sleep(st)
it = it + st
print('Active Threads: ' + str(threading.activeCount()))
# Terminate after max time
if (it>=mt):
break
# Wait for all threads to complete
#for t in threads:
# t.join()
#print('Threads complete')
# Initialize array for objective
obj = np.empty_like(a)
# Retrieve objective results
id = 0
for i in range(a.shape[0]):
for j in range(b.shape[1]):
obj[i,j] = threads[id].objective
id += 1
# plot 3D figure of results
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(a, b, obj, \
rstride=1, cstride=1, cmap=cm.coolwarm, \
vmin = 12, vmax = 22, linewidth=0, antialiased=False)
ax.set_xlabel('a')
ax.set_ylabel('b')
ax.set_zlabel('obj')
ax.set_title('Multi-Threaded GEKKO')
plt.show()
See https://apmonitor.com/me575/index.php/Main/ParallelComputing
GEKKO should create unique directories for each model created, even in parallel. If you're solving the problems remotely, the issue is likely overlapping directories on the server side. The server creates a directory based on model name and IP address. GEKKO increments default model names with a global model instance counter to avoid this problem, but parallelizing probably doesn't share the same instance counter so you get overlapping model names. Solutions involve solving locally, as @APMonitor suggested, or forcing model names to be unique.
@APMonitor, the solution to Issue #30 would resolve this problem too.
Thanks for everyones help onthis--i was able to get get it mostly working --however there maybe one more issue Id like a little insight on if possible.
I am now running GEKKO with (remote=False) and this spawns an apm monitor subprocess, which is not stopped until the the entire program exits I think. This causes the check for active children processes to always return > 1.
I have tried a 'del' on the GEKKO object, but it does not clean up the subprocess. Might it be best if the
cleanup() method of the GEKKO class killed it's corresponding apm process? If not how would you recommend killing the apm subprocess if your parent process is long-running?
One option is to run an OS command to kill any apm.exe processes for Windows:
taskkill /F /IM apm.exe
or from Python with
import os
os.system("taskkill /f /im apm.exe")
Does this help? I suppose that we could have a separate function such as m.kill() that would do this by the PID or m.kill_all() for stopping all apm.exe
. We'd likely need to identify the OS and use different system commands based on Windows, MacOS, or Linux. Thoughts?
@APMonitor, I bet the f2py dll/so alternative we discussed would avoid problems like this that come from calling exe subprocesses.
One potential disadvantage of using f2py if I understand it correctly, is that the Fortran code would have to be compiled specifically for the user's version of python, so you would either have to compile the Fortran during the installation process, or maintain a large library of binary files.
On Wed, Jul 24, 2019, 5:18 AM loganbeal notifications@github.com wrote:
@APMonitor https://github.com/APMonitor, I bet the f2py dll/so alternative we discussed would avoid problems like this that come from calling exe subprocesses.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BYU-PRISM/GEKKO/issues/61?email_source=notifications&email_token=AEUJHIJNW5OLJLWT2TY32BLQBBCCTA5CNFSM4IEAUAJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2WEWHA#issuecomment-514607900, or mute the thread https://github.com/notifications/unsubscribe-auth/AEUJHIN2N2TND4SM33V2XITQBBCCTANCNFSM4IEAUAJQ .
@abe-mart there is some dependence on version, but I didn't think it was that extreme. Numpy does this and I think they figured out how to have pip deliver only the relevant files to the user's system. I never figured that out.
@abe-mart I compile 6 binaries right now (Windows, Linux, MacOS, and Linux ARM with public server and local versions for Windows and MacOS with different solver options that can be released). You are correct that I'd need to produce another set for the library option.
@loganbeal I like the option of downloading only the relevant binaries so that the pip install doesn't get too big. Right now it is about 10MB and the extra so/dll files would likely take it to 20MB.
@wingmanzz did it help to use the kill option? On starting the APM process, we could also keep track of a PID on the OS and kill just that process if you need more fine-grain control.
Yes, the kill did work--but it also turned out I was running into a deadlock situation with lines 1824-1831 in gekko.py
if debug >= 1:
# Start recording output if error is detected
if '@error' in line:
record_error = True
if record_error:
apm_error+=line
app.wait()
These particular section caused me a deadlock (and the apm processes to never quit) when I run processes in parrallel on a local server. For the time being, I have just commented them out, but I think it has something to do with the pipe buffer deadlock problem mentioned here:
https://stackoverflow.com/questions/2381751/can-someone-explain-pipe-buffer-deadlock
Thanks for pointing out this issue. I've also seen the deadlock issue when running locally, even not in parallel so it is nice to have a lead on tracking down this problem. You could also remove this problem by calling m.solve(debug=0)
with option debug=0
to avoid the locking section. Here is some additional documentation that I found:
https://docs.python.org/3.7/library/subprocess.html?highlight=subprocess#popen-objects
Note This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to avoid that.
app = subprocess.Popen([apm_exe, self._model_name], stdout=subprocess.PIPE, \
stderr=subprocess.PIPE,cwd = self._path, \
env = {"PATH" : self._path }, universal_newlines=True)
for line in iter(app.stdout.readline, ""):
if disp == True:
try:
print(line.replace('\n', ''))
except:
pass
else:
pass
if debug >= 1:
# Start recording output if error is detected
if '@error' in line:
record_error = True
if record_error:
apm_error+=line
app.wait()
_, errs = app.communicate()
Here are a few things that I'll try:
app.communicate
instead of app.stdout.readline
- it wasn't as simple as just replacing it so I'll need to dig more.Any other ideas?
Deadlock issue is resolved with no command line output for remote=False
.
Hello,
Ive just started using GEKKO and so far so good--I do have an issue though, I have an parallel process which calls my GEKKO solver. After looking at the output from several runs, it was obvious the solutions reported from each of the parallel jobs were NOT stable. It may produce anywhere from 25-50 of the 'correct' solutions. If I make the jobs sequential again, it works perfectly, and produces all 100+ solutions submitted to the solver.
After looking at the docs, I am assuming this is might be because the backend APMonitor is writing to a file and each of the parallel jobs is writing over the file before and during the other jobs being solved by GEKKO..does anyone have a suggestion for making GEKKO parallel-safe?
WORKS: ` pool = Pool(processes=1)
DOES NOT WORK: ` pool = Pool(processes=max_jobs)
Where func calls a GEKKO 'solver'.