pySOT repeats the same function evaluations when using external C++ objective function

ili3p commented 8 years ago

I'm using pySOT as the example 11.6 external C++ objective function as shown here https://github.com/dme65/pySOT/blob/444070e31887525ed418166d9107b6a355353fc9/docs/pySOT.pdf.

When I set the number of evaluations to 2 for example, it will repeat those same 2 over and over again. It would never finish.

If needed I can share the code privately.

Full log:

(.venv)ilija@deep03:~/h$ python pySOT_runner.py 1 2

Number of threads: 1
Maximum number of evaluations: 2
Search strategy: Candidate DyCORS
Experimental design: Latin Hypercube
Surrogate: Cubic RBF
{
  epoch : 17
  showProgress : false
  seed : 139
  hiddenNodes : 130
  experimentId : 1
  threads : 16
  momentum : 0.8071425
  deviceId : 1
  batchSize : 128
  learnRate : 0.05075
  plot : false
  save : "logs"
  mean : 0.002500075
  std : 0.00453571428571
  learnRateDecay : 0
}
0.1937

{
  epoch : 19
  showProgress : false
  seed : 139
  hiddenNodes : 173
  experimentId : 1
  threads : 16
  momentum : 0.8214275
  deviceId : 1
  batchSize : 128
  learnRate : 0.0223214285714
  plot : false
  save : "logs"
  mean : 0.000357239285714
  std : 0.0893928571429
  learnRateDecay : 0
}
0.6733

{
  epoch : 17
  showProgress : false
  seed : 139
  hiddenNodes : 130
  experimentId : 1
  threads : 16
  momentum : 0.8071425
  deviceId : 1
  batchSize : 128
  learnRate : 0.05075
  plot : false
  save : "logs"
  mean : 0.002500075
  std : 0.00453571428571
  learnRateDecay : 0
}
0.1937

{
  epoch : 19
  showProgress : false
  seed : 139
  hiddenNodes : 173
  experimentId : 1
  threads : 16
  momentum : 0.8214275
  deviceId : 1
  batchSize : 128
  learnRate : 0.0223214285714
  plot : false
  save : "logs"
  mean : 0.000357239285714
  std : 0.0893928571429
  learnRateDecay : 0
}
0.6733

And it will repeat this forever. These are the parameters to optimise

{
  epoch : 19
  showProgress : false
  seed : 139
  hiddenNodes : 173
  experimentId : 1
  threads : 16
  momentum : 0.8214275
  deviceId : 1
  batchSize : 128
  learnRate : 0.0223214285714
  plot : false
  save : "logs"
  mean : 0.000357239285714
  std : 0.0893928571429
  learnRateDecay : 0
}

and 0.6733 is the function evaluation result. You can see it repeats the same two function evaluations, i.e. it doesn't try with different set of parameters.

ili3p commented 8 years ago

Btw it works if I use the Popen(exp_arg, stdout=PIPE) inside a python class that has the objfunction function. And then pass its object to a BasicWorkerThread.

Example, TorchOptim is a class that has the needed properties (similar to the Ackley class in the examples), then I can call this:

data = TorchOptim()

# Create a strategy and a controller
controller = ThreadController()
controller.strategy = \
    SyncStrategyNoConstraints(
        worker_id=0, data=data,
        maxeval=maxeval, nsamples=nsamples,
        exp_design=LatinHypercube(dim=data.dim, npts=2*(data.dim+1)),
        response_surface=RBFInterpolant(surftype=CubicRBFSurface, maxp=maxeval),
        sampling_method=CandidateDYCORS(data=data, numcand=100*data.dim))

# Launch the threads and give them access to the objective function
for _ in range(nthreads):
    worker = BasicWorkerThread(controller, data.objfunction)
    controller.launch_worker(worker)

# Run the optimization strategy
result = controller.run()

Inside the objfunction function I'm calling the external process via Popen and then the program works as intended.

dme65 commented 8 years ago

How many points are in your initial design? Since you are only allowing two evaluations of the objective function you initial design should consist of at most two points, but my bet is that it contains more than that. pySOT currently isn't making sure that your evaluation budget is larger or equal to the size of your experimental design, but I will add this to the next version.

ili3p commented 8 years ago

Actually, I'm only using two evaluations, here in the example. In the real experiments I'm using from 500 to 5000 evaluations with step 500 (so 10 experiments). They all had this issue. But the number of evaluations should not be the problem since it works when I'm using BasicWorkerThread. The problem must be related to the ProcessWorkerThread class.

dme65 commented 8 years ago

It would be helpful if you could create a Minimal, Complete, and Verifiable example of your objective function and your run set up so that I can take a look. Are you having any issues if you run test_subprocess.py?

Also, since you are using the ProcessWorkerThread I assume that you are using an external objective function. Are you sure you that output parsing works correctly? The examples I have for pySOT I print a string at the end of the objective function evaluation, but if your objective function is printing while running it may confuse pySOT to think that this is the final output.

Just to make sure that there is nothing weird with your experimental setup (excluding the objective function), I recommend that you try the CheckInput branch that I just pushed. This branch adds some basic input checking to make sure that evaluation budget matches the experimental design and so on.

ili3p commented 8 years ago

Ok, to prepare a reproducible example, I needed to transfer the code to a CPU version, since the original one is written in torch and it uses GPUs. After I converted the code to a CPU version, I run it and saw that it works fine. So I investigated, and it turns out that the GPU framework code is printing some extra characters that is actually messing the parsing of the output. Anyway after I disabled the GPU printing, the issue got solved.

I didn't thought this could be the problem, since I would expect to fail on all function evaluations and to finally end executing, saying it couldn't find good parameters since it couldn't do enough evaluations or something in those lines. Definitely, not to loop indefinitely, trying the same parameters over and over again. :)

Thanks for the help!

dme65 commented 8 years ago

I'm glad it worked out. I'll think about a good way to warn the user if the output is non-numerical since it is a little annoying that pySOT doesn't even check if the output makes sense. Good luck with your runs!

ili3p commented 8 years ago

Thanks. Yes, it's a problem what to do when all the function evaluations fail. I think pySOT should also fail in that case, or maybe, if let's say more than 50% of the function evaluations has failed, to stop executing or to print out an error.

dme65 / pySOT

pySOT repeats the same function evaluations when using external C++ objective function #6