dme65 / pySOT

Surrogate Optimization Toolbox for Python
Other
205 stars 53 forks source link

pySOT is slow to generate new points with large dataset #49

Closed sjohnson-FLL closed 1 year ago

sjohnson-FLL commented 1 year ago

I am using pySOT with EIStrategy and GPRegressor surrogate model for a 6-dimensional optimization problem (all axes are continuous). I am using 10 worker threads. I have Python 3.11.1 and Windows 10.

In recent runs, I have observed that after a little more than 1000 data points, generating new points starts to slow WAY down. Before this occurs, most time is spent evaluating the objective function, with worker threads being assigned a new task within seconds (or ms even) of finishing the previous evaluation. However, at a little over 1000 data points, I noticed that almost all the time is spent waiting for new assignments.

This is confirmed by computing resource allocation. Prior to 1k, the evaluations called by worker threads take up all available CPU. After 1k, Python takes ~30% and worker threads rarely take any at all. In fact, only one worker thread is ever actually evaluating the objective at once because it gets done evaluating the objective before the next point has been generated.

This leads me to suspect that the process of generating new points slows WAY down around 1k points.

  1. Is this expected behavior? It is totally possible I've got a bug in my code somewhere that is causing this.
  2. Any tips to work around this? a. I would be happy to accept generating less optimal points if it meant more could be generated faster. b. I have some capacity to rewrite and/or multithread functions in C, if there is a specific function that may be the bottleneck.

Edit: this SE article seems relevant. https://stats.stackexchange.com/questions/326446/gaussian-process-regression-for-large-datasets

sjohnson-FLL commented 1 year ago

Update: After further investigation, I now believe my problem was caused by the dtol setting in EIStrategy(). The default value of dtol is 10^-3*norm(ub-lb). It makes sense that generating new points starts to slow down when the requirement for proximity to previous points is overly strict. I solved my problem by changing to dtol=5*10^-5*norm(ub-lb). This solved my problem for at least 7000 points. I imagine if it starts to slow down again I can decrease the scaling on dtol again.

Closing the issue, but wanted to leave this comment here in case anyone else has similar troubles.