Open Stefan-Endres opened 6 years ago
Not sure if the status has changed a lot already for this topic, but I have a pretty expensive function (2-10 minutes) so I'd be interested in helping devise something.
Same here. My main interest are computational and algorithmic changes to better address situations where (1) the objective function is very expensive, but heavily vectorized [i.e. the cost of evaluating a single point is very similar to the cost of evaluating a large set of points at the same time]; and (2) we have large numbers of computing nodes at our disposal that can work in parallel.
I wonder if the pattern I use here (for optuna) might also work for SHGO? The idea is that SGHO only sees an objective function and doesn’t need to worry about how it gets computed.
https://github.com/microprediction/embarrassingly
From: fcela notifications@github.com Reply-To: Stefan-Endres/shgo reply@reply.github.com Date: Monday, October 26, 2020 at 10:15 AM To: Stefan-Endres/shgo shgo@noreply.github.com Cc: Peter Cotton PCotton@intechinvestments.com, Comment comment@noreply.github.com Subject: [EXT] Re: [Stefan-Endres/shgo] Parallelization (#17)
Same here. My main interest are computational and algorithmic changes to better address situations where (1) the objective function is very expensive, but heavily vectorized [i.e. the cost of evaluating a single point is very similar to the cost of evaluating a large set of points at the same time]; and (2) we have large numbers of computing nodes at our disposal that can work in parallel. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
This electronic message, including its attachments, contains information from Janus Henderson Investors. Janus Henderson Investors is the name under which various entities, including Janus Capital Management LLC, Perkins Investment Management LLC and Intech Investment Management LLC provide investment products and services. All of these companies are wholly owned subsidiaries of Janus Henderson Group plc (incorporated in Jersey, registered no.101484, registered office 47 Esplanade, St Helier, Jersey JE1 0BD). This email contains information which is confidential and may be privileged and attorney work product, intended solely for the use of the individual or entity named above. Your personal information will be kept in accordance with the applicable data privacy laws and Janus Henderson’s Privacy Policy. A copy of the document is available under the Privacy Policy section of our website at www.janushenderson.com and in hard copy by sending a request to privacy@janushenderson.com If you are not the intended recipient, be aware that you must not read this email and that any disclosure, copying, distribution or use of the contents of this email is prohibited. If you have received this email in error, please immediately notify the sender and delete the original email and its attachments without reading or saving in any manner. .
Hi @microprediction @fcela.
In the most recent update (7e83bb8) I've added the workers
argument for shgo
to allow for basic parallelization.
I would greatly appreciate any feedback and/or error reports using the argument. Currently I have only tested it with very simple Python objective functions. I suspect there might be issues such as pickling errors with more complex functions. Since all the unittests are passing I have also uploaded it to PyPi for a more convenient install, but would I like to test the implementation more before expanding the code and the documentation for downstream repositories.
Minimum working example:
from shgo import shgo
import time
# Toy problem
def f(x):
time.sleep(0.1)
return x[0] ** 2 + x[1] ** 2
bounds = np.array([[0, 1],]*2)
ts = time.time()
res = shgo(f, bounds, n=50, iters=2)
print(f'Total time serial: {time.time()- ts}')
print('-')
print(f'res = {res}')
ts = time.time()
res = shgo(f, bounds, n=50, iters=2, workers=8)
print('=')
print(f'Total time par: {time.time()- ts}')
print('-')
print(f'res = {res}')
CLI output:
Total time serial: 10.341249465942383
-
res = fun: 0.0
funl: array([0.])
message: 'Optimization terminated successfully.'
nfev: 103
nit: 2
nlfev: 3
nlhev: 0
nljev: 1
success: True
tnev: 103
x: array([0., 0.])
xl: array([[0., 0.]])
=
Total time par: 1.9465992450714111
-
res = fun: 0.0
funl: array([0.])
message: 'Optimization terminated successfully.'
nfev: 103
nit: 2
nlfev: 3
nlhev: 0
nljev: 1
success: True
tnev: 103
x: array([0., 0.])
xl: array([[0., 0.]])
Relevant code snippet (uses the multiprocessing
library):
The parallelization occurs while evaluating the functions during the sampling stage. During the local minimization step serial evaluations are still used. In the future I would like to add parallelization here that provides each core with a starting point plus the chosen local minimisation function, ideally only using the standard library and scipy
dependencies.
That’s super helpful. Will try to get to it today.
From: Stefan Endres notifications@github.com Reply-To: Stefan-Endres/shgo reply@reply.github.com Date: Tuesday, October 27, 2020 at 12:34 PM To: Stefan-Endres/shgo shgo@noreply.github.com Cc: Peter Cotton PCotton@intechinvestments.com, Mention mention@noreply.github.com Subject: [EXT] Re: [Stefan-Endres/shgo] Parallelization (#17)
Hi @microprediction @fcela. In the most recent update (7e83bb8) I've added the workers argument for shgo to allow for basic parallelization. I would greatly appreciate any feedback and/or error reports using the argument. Currently I have only tested it with very simple Python objective functions. I suspect there might be issues such as pickling errors with more complex functions. Since all the unittests are passing I have also uploaded it to PyPi for a more convenient install, but would I like to test the implementation more before expanding the code and the documentation for downstream repositories. Minimum working example:
from shgo import shgo import time
def f(x): time.sleep(0.1) return x[0] 2 + x[1] 2 bounds = np.array([[0, 1],]*2) ts = time.time() res = shgo(f, bounds, n=50, iters=2) print(f'Total time serial: {time.time()- ts}') print('-') print(f'res = {res}') ts = time.time() res = shgo(f, bounds, n=50, iters=2, workers=8) print('=') print(f'Total time par: {time.time()- ts}') print('-') print(f'res = {res}')
res = fun: 0.0 funl: array([0.]) message: 'Optimization terminated successfully.' nfev: 103 nit: 2 nlfev: 3 nlhev: 0 nljev: 1 success: True tnev: 103 x: array([0., 0.]) xl: array([[0., 0.]]) Relevant code snippet (uses the multiprocessing library): https://github.com/Stefan-Endres/shgo/blob/7e83bb8291a3420ff1f8c665647af005e568e229/shgo/_shgo_lib/_vertex.py#L436-L449 The parallelization occurs while evaluating the functions during the sampling stage. During the local minimization step serial evaluations are still used. In the future I would like to add parallelization here that provides each core with a starting point plus the chosen local minimisation function, ideally only using the standard library and scipy dependencies. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
This electronic message, including its attachments, contains information from Janus Henderson Investors. Janus Henderson Investors is the name under which various entities, including Janus Capital Management LLC, Perkins Investment Management LLC and Intech Investment Management LLC provide investment products and services. All of these companies are wholly owned subsidiaries of Janus Henderson Group plc (incorporated in Jersey, registered no.101484, registered office 47 Esplanade, St Helier, Jersey JE1 0BD). This email contains information which is confidential and may be privileged and attorney work product, intended solely for the use of the individual or entity named above. Your personal information will be kept in accordance with the applicable data privacy laws and Janus Henderson’s Privacy Policy. A copy of the document is available under the Privacy Policy section of our website at www.janushenderson.com and in hard copy by sending a request to privacy@janushenderson.com If you are not the intended recipient, be aware that you must not read this email and that any disclosure, copying, distribution or use of the contents of this email is prohibited. If you have received this email in error, please immediately notify the sender and delete the original email and its attachments without reading or saving in any manner. .
Since you are using multiprocessing, let me see if I can get it to work on Ray. The new wrapper for multiprocessing in Ray looks verypromising, and if it works well, that may be all that is needed to scale up multi node.
https://docs.ray.io/en/master/multiprocessing.html
From: Peter Cotton notifications@github.com Sent: Tuesday, October 27, 2020, 12:35 To: Stefan-Endres/shgo Cc: fcela; Mention Subject: Re: [Stefan-Endres/shgo] Parallelization (#17)
That’s super helpful. Will try to get to it today.
From: Stefan Endres notifications@github.com Reply-To: Stefan-Endres/shgo reply@reply.github.com Date: Tuesday, October 27, 2020 at 12:34 PM To: Stefan-Endres/shgo shgo@noreply.github.com Cc: Peter Cotton PCotton@intechinvestments.com, Mention mention@noreply.github.com Subject: [EXT] Re: [Stefan-Endres/shgo] Parallelization (#17)
Hi @microprediction @fcela. In the most recent update (7e83bb8) I've added the workers argument for shgo to allow for basic parallelization. I would greatly appreciate any feedback and/or error reports using the argument. Currently I have only tested it with very simple Python objective functions. I suspect there might be issues such as pickling errors with more complex functions. Since all the unittests are passing I have also uploaded it to PyPi for a more convenient install, but would I like to test the implementation more before expanding the code and the documentation for downstream repositories. Minimum working example:
from shgo import shgo import time
def f(x): time.sleep(0.1) return x[0] 2 + x[1] 2
bounds = np.array([[0, 1],]*2)
ts = time.time() res = shgo(f, bounds, n=50, iters=2) print(f'Total time serial: {time.time()- ts}') print('-') print(f'res = {res}') ts = time.time() res = shgo(f, bounds, n=50, iters=2, workers=8) print('=') print(f'Total time par: {time.time()- ts}') print('-') print(f'res = {res}')
res = fun: 0.0 funl: array([0.]) message: 'Optimization terminated successfully.' nfev: 103 nit: 2 nlfev: 3 nlhev: 0 nljev: 1 success: True tnev: 103 x: array([0., 0.]) xl: array([[0., 0.]])
Relevant code snippet (uses the multiprocessing library): https://github.com/Stefan-Endres/shgo/blob/7e83bb8291a3420ff1f8c665647af005e568e229/shgo/_shgo_lib/_vertex.py#L436-L449 The parallelization occurs while evaluating the functions during the sampling stage. During the local minimization step serial evaluations are still used. In the future I would like to add parallelization here that provides each core with a starting point plus the chosen local minimisation function, ideally only using the standard library and scipy dependencies. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
This electronic message, including its attachments, contains information from Janus Henderson Investors. Janus Henderson Investors is the name under which various entities, including Janus Capital Management LLC, Perkins Investment Management LLC and Intech Investment Management LLC provide investment products and services. All of these companies are wholly owned subsidiaries of Janus Henderson Group plc (incorporated in Jersey, registered no.101484, registered office 47 Esplanade, St Helier, Jersey JE1 0BD). This email contains information which is confidential and may be privileged and attorney work product, intended solely for the use of the individual or entity named above. Your personal information will be kept in accordance with the applicable data privacy laws and Janus Henderson’s Privacy Policy. A copy of the document is available under the Privacy Policy section of our website at www.janushenderson.com and in hard copy by sending a request to privacy@janushenderson.com If you are not the intended recipient, be aware that you must not read this email and that any disclosure, copying, distribution or use of the contents of this email is prohibited. If you have received this email in error, please immediately notify the sender and delete the original email and its attachments without reading or saving in any manner. .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Stefan-Endres/shgo/issues/17#issuecomment-717369197, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAH4TCYVEHRHME3AJBWBQGTSM3ZF5ANCNFSM4EWVYVFA.
Nice. Sorry I haven’t tested yet. I got derailed by Monday night football, as you can see here: https://www.microprediction.com/blog/nine
From: fcela notifications@github.com Reply-To: Stefan-Endres/shgo reply@reply.github.com Date: Tuesday, October 27, 2020 at 10:26 PM To: Stefan-Endres/shgo shgo@noreply.github.com Cc: Peter Cotton PCotton@intechinvestments.com, Mention mention@noreply.github.com Subject: [EXT] Re: [Stefan-Endres/shgo] Parallelization (#17)
Since you are using multiprocessing, let me see if I can get it to work on Ray. The new wrapper for multiprocessing in Ray looks verypromising, and if it works well, that may be all that is needed to scale up multi node.
https://docs.ray.io/en/master/multiprocessing.html
From: Peter Cotton notifications@github.com Sent: Tuesday, October 27, 2020, 12:35 To: Stefan-Endres/shgo Cc: fcela; Mention Subject: Re: [Stefan-Endres/shgo] Parallelization (#17)
That’s super helpful. Will try to get to it today.
From: Stefan Endres notifications@github.com Reply-To: Stefan-Endres/shgo reply@reply.github.com Date: Tuesday, October 27, 2020 at 12:34 PM To: Stefan-Endres/shgo shgo@noreply.github.com Cc: Peter Cotton PCotton@intechinvestments.com, Mention mention@noreply.github.com Subject: [EXT] Re: [Stefan-Endres/shgo] Parallelization (#17)
Hi @microprediction @fcela. In the most recent update (7e83bb8) I've added the workers argument for shgo to allow for basic parallelization. I would greatly appreciate any feedback and/or error reports using the argument. Currently I have only tested it with very simple Python objective functions. I suspect there might be issues such as pickling errors with more complex functions. Since all the unittests are passing I have also uploaded it to PyPi for a more convenient install, but would I like to test the implementation more before expanding the code and the documentation for downstream repositories. Minimum working example:
from shgo import shgo import time
def f(x): time.sleep(0.1) return x[0] 2 + x[1] 2
bounds = np.array([[0, 1],]*2)
ts = time.time() res = shgo(f, bounds, n=50, iters=2) print(f'Total time serial: {time.time()- ts}') print('-') print(f'res = {res}') ts = time.time() res = shgo(f, bounds, n=50, iters=2, workers=8) print('=') print(f'Total time par: {time.time()- ts}') print('-') print(f'res = {res}')
res = fun: 0.0 funl: array([0.]) message: 'Optimization terminated successfully.' nfev: 103 nit: 2 nlfev: 3 nlhev: 0 nljev: 1 success: True tnev: 103 x: array([0., 0.]) xl: array([[0., 0.]])
Relevant code snippet (uses the multiprocessing library): https://github.com/Stefan-Endres/shgo/blob/7e83bb8291a3420ff1f8c665647af005e568e229/shgo/_shgo_lib/_vertex.py#L436-L449 The parallelization occurs while evaluating the functions during the sampling stage. During the local minimization step serial evaluations are still used. In the future I would like to add parallelization here that provides each core with a starting point plus the chosen local minimisation function, ideally only using the standard library and scipy dependencies. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
This electronic message, including its attachments, contains information from Janus Henderson Investors. Janus Henderson Investors is the name under which various entities, including Janus Capital Management LLC, Perkins Investment Management LLC and Intech Investment Management LLC provide investment products and services. All of these companies are wholly owned subsidiaries of Janus Henderson Group plc (incorporated in Jersey, registered no.101484, registered office 47 Esplanade, St Helier, Jersey JE1 0BD). This email contains information which is confidential and may be privileged and attorney work product, intended solely for the use of the individual or entity named above. Your personal information will be kept in accordance with the applicable data privacy laws and Janus Henderson’s Privacy Policy. A copy of the document is available under the Privacy Policy section of our website at www.janushenderson.com and in hard copy by sending a request to privacy@janushenderson.com If you are not the intended recipient, be aware that you must not read this email and that any disclosure, copying, distribution or use of the contents of this email is prohibited. If you have received this email in error, please immediately notify the sender and delete the original email and its attachments without reading or saving in any manner. .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Stefan-Endres/shgo/issues/17#issuecomment-717369197, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAH4TCYVEHRHME3AJBWBQGTSM3ZF5ANCNFSM4EWVYVFA.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Ray parallelization appears to work without any problem -- just replaing import multiprocessing as mp with import ray.util.multiprocessing as mp in shgo/_shgo_lib/_vertex.py
This is what I get for the minimal example above, multiprocessing vs ray.
Multiprocessing
Total time serial: 10.44602346420288
-
res = fun: 0.0
funl: array([0.])
message: 'Optimization terminated successfully.'
nfev: 104
nit: 2
nlfev: 4
nlhev: 0
nljev: 1
success: True
tnev: 104
x: array([0., 0.])
xl: array([[0., 0.]])
=
Total time par: 2.0377724170684814
-
res = fun: 0.0
funl: array([0.])
message: 'Optimization terminated successfully.'
nfev: 104
nit: 2
nlfev: 4
nlhev: 0
nljev: 1
success: True
tnev: 104
x: array([0., 0.])
xl: array([[0., 0.]])
Ray
Total time serial: 10.438774108886719
-
res = fun: 0.0
funl: array([0.])
message: 'Optimization terminated successfully.'
nfev: 104
nit: 2
nlfev: 4
nlhev: 0
nljev: 1
success: True
tnev: 104
x: array([0., 0.])
xl: array([[0., 0.]])
2020-11-02 14:55:04,351 INFO services.py:1166 -- View the Ray dashboard at http://127.0.0.1:8265
=
Total time par: 3.972384452819824
-
res = fun: 0.0
funl: array([0.])
message: 'Optimization terminated successfully.'
nfev: 104
nit: 2
nlfev: 4
nlhev: 0
nljev: 1
success: True
tnev: 104
x: array([0., 0.])
xl: array([[0., 0.]])
Of course for a problem this small, ray's parallelization overhead is overkill.
I created this example https://github.com/microprediction/humpday/blob/main/Embarrassingly_SHGO.ipynb
Seems smooth enough, but still a toy example I suppose.
Taking a look at your code now to see what your concern might be re: pickle, but at least for my use case the objective function (or "pre-objective" as I've called it) is probably going to ssh off somewhere and shell out.
Perhaps...
Overview
There are many sub-routines in shgo that are low hanging fruit for parallelization, most importantly the sampling mapping of the objective function itself (in cases where it is possible to parallelize the problem). In addition we can optimize the sampling size of an iteration based on the computational resources available.
We need to be careful with both dependencies and code structure changes we include for several reasons. First our dependencies on
scipy.optimize.minimize
andscipy.spatial.Delaunay
. Secondly our ambition to includeshgo
inscipy.optimize
means it should ideally have the same dependencies and structure. Finally we want to minimize reliance on maintenance from other packages which can lead to such issues that we had with usingmultiprocessing_on_dill
intgo
.My suggestion is to use numba to avoid needing to change the code structure at all. We can do this using tricks such as the one used in poliastro: https://github.com/poliastro/poliastro/blob/0.6.x/src/poliastro/jit.py https://github.com/poliastro/poliastro/blob/master/setup.py#L40
which simply maps the decorator to the dependency if it is installed or does nothing. Obviously it is possible that we can use the same tricks for other libraries and methods by redefining range functions etc in our code.
While it is possible that SciPy will eventually include numba as a dependency, based on discussions in the scipy-dev mailing lists this will not happen in the near future: https://mail.python.org/pipermail/scipy-dev/2018-March/022576.html
However, for now we should be able to maintain numba as an optional dependency as describe in the rest of this post. My idea is to provide two main modes of parallelization, based on both CPU and GPU parallelization. So therefore using numba for GPU parallelization would be ideal since it avoids extra dependencies. Finally numba provides us with access to LLVM that can be used in the sampling generation.
GPU
User architecture can be found or specified and then we can map it to our own decorator.
Nvidia
We can use
@numba.cuda.jit
, I propose an early test of the objective function so that the user can be warned if this fails. https://devblogs.nvidia.com/seven-things-numba/ https://numba.pydata.org/numba-doc/latest/cuda/index.htmlAMD
We can use @hsa.jit(device=True) https://numba.pydata.org/numba-doc/latest/hsa/overview.html
CPU
Our options for CPU parallelization http://numba.pydata.org/numba-doc/dev/user/parallel.html or multiprocessing_on_dill etc.
However, it appears that parallelization with numba isn't as simple as just adding
@jit(parallel = True)
https://stackoverflow.com/questions/45610292/how-to-make-numba-jit-use-all-cpu-cores-parallelize-numba-jitSo we should also do a few tests on non-trivial functions to see if it is worth implementing.