HW4 - Simulated Annealing

meneal commented 9 years ago

Getting started on this now.

meneal commented 9 years ago

Henderson et al. This is the paper that's mentioned in Menzies' write up on Simulated Annealing.

meneal commented 9 years ago

Just pushed the baseline study. Take a look at what I've done and see if it makes sense. I went a little beyond what he suggested and ran 100 models 100 times and output the median max and min.

ghost commented 9 years ago

Looks good @meneal. I'm going to start working on the SA part.

ghost commented 9 years ago

I'm working on SA right now and I am not getting the results that are expected. You'll notice that I changed the > to a < on line 115 in order to get the while loop to execute. This is because e_max is, understandably, a high number, so we would expect the initial energy to be lower than e_max. But due to the condition, the while loop will never execute. Is e_max supposed to be a high energy or low energy value? I know we are trying to lower ("cool") the energy of the system, but the name is confusing me.

Also, I'm not sure if the probability function is correct. I substituted t = 1 - k/kmax in Menzies' code based upon what I saw.

meneal commented 9 years ago

I'll take a look at this later on tonight.

meneal commented 9 years ago

Honestly after poking at this for a good long while today I'm really confused about how best to implement this. I've been messing around in the expirimental branch. Here are a few things that I'm not sure about at this point:

It appears from what I'm reading that we need to minimize energy. So should emax be the minimum value from our baseline study? And if so, what value is it, the minimum energy value prior to normalization or is it the minimum energy value post normalization? If it is post normalization then why would we even need to have a study since it will automatically be zero based on how the normalization occurs?
The neighbors function is also confusing. He says in his write up on this that we need to mutate the independent variable. What is the independent variable in this case? We've mutated both f1 and f2 in Joseph's code so far. Is that right?
The output from his code in CODE4.md is also really confusing. I get that the first number in each line is the iteration number, but what is the second number? Why are there two values for y? Also why does he seem to be optimizing for an e value of 1 here?

If possible it would be great to meet tomorrow to talk a little about where we're at with this, or even after class just to try to figure out some answers to all of that.

meneal commented 9 years ago

Check out this link under: Test functions for multi-objective optimization problems. You'll see Schaffer there with minimize for the optimization.

meneal commented 9 years ago

So I worked on this a bunch tonight. Since I misspelled expirimental, I created another branch called matt and worked in there. I rewrote the baseline study according to what Joseph and I talked about after class.

Let me know if that's confusing since I used a bunch of weird closure stuff to make it work. I also rewrote neighbor, probability, and sim_anneal. The main thing with neighbor is that we really only have one independent variable and we're adding to it or subtracting from it randomly.

With probability I decided to start k at 1.0 instead of zero to avoid division by zero. I also cleaned up any areas where we were making calls any more than 1 independent variable.

The problem I see with it at this point is that it seems to not be actually cooling for some reason. You'll see that the output just continues having question marks all the way up until it stops. Obviously that's not exactly what we're looking for. It seems better now though. Let me know what questions you have.

ghost commented 9 years ago

I see your code, but I'm not sure how to sync it to my local copy. I'm using the GitHub client for Windows. I can only select the master branch; I can't see the other two. Maybe you can merge your branch back into master, since it's most likely better than what's there now?

aisobran commented 9 years ago

The first thing I noticed is the prob function is returning values > 1.0 so every time the final else if is evaluated it's always true. I think there is something else wrong with the implementation though.

aisobran commented 9 years ago

And if k is the temperature then why are we increasing the temp shouldn't we be decreasing the temp? k/kmax increases as the program progresses. Shouldn't it decrease?

aisobran commented 9 years ago

The code is also normalizing using [0,1] in baseline but in the annealing the value can go negative. That also seems to be throwing off the measurements.

meneal commented 9 years ago

Sasha are you looking at the Matt branch? I can merge it in until sometime tomorrow. I'm just getting to CA now so it'll likely be afternoon eastern time before I can merge.

On Wednesday, September 16, 2015, Alexander Sobran notifications@github.com wrote:

The code is also normalizing using [0,1] in baseline but in the annealing the value can go negative. That also seems to be throwing off the measurements.

— Reply to this email directly or view it on GitHub https://github.com/LambdaConglomerate/x9115lam/issues/8#issuecomment-140943750 .

aisobran commented 9 years ago

Yeah, Matt branch.

meneal commented 9 years ago

So a few thoughts on why that might be happening. Running the study 10000 times I get these values:

f1 max 0.999994732272
f1 min 2.7110766076e-09
f2 max 3.99979173071
f2 min 1.00000526774

These values are generated using a randomly generated x value [0,1].

The difference is that when we are running in simulated annealing we run the algorithm around 10000 times and there is a potential to add .01 or subtract .01 10000 times from the base x value of zero.

So what happens is x becomes either greater or less than the [0,1] range that we are using in our baseline study and we end up with f1 and f2 return values that are larger than what we have seen in the baseline study. So if either f1 or f2 are beyond the max and min we've set in the normalization function we'll end up with f1 or f2 values outside of the range [0,1].

Now the big thing at this point is how to deal with that. There are a few options:

Re-run the study with a wider range for x, something like [-1, 1]
Clip the x values returned from the neighbor function so that they are inside the range [0,1]

As far as the probability function I agree that there's something wrong. I only followed Menzies code for the probability function and implementation of k/kmax. I also really didn't know what value to use for kmax, or for epsilon, so I just used values based on what I thought intuitively would work.

Change away on the code and I'll be happy to respond on any issues. My code is in the master branch now.

On Thu, Sep 17, 2015 at 7:39 AM, Alexander Sobran notifications@github.com wrote:

Yeah, Matt branch.

— Reply to this email directly or view it on GitHub https://github.com/LambdaConglomerate/x9115lam/issues/8#issuecomment-141050800 .

aisobran commented 9 years ago

Yeah, I think the best way to deal with being outside of normalization is not to clip but to wrap(mod). Menzies specifically stated that clipping was a bad mistake that cost him a few years in research. I also don't think running [-1, 1] is useful as theoretically you could get outside those bounds.

I think the best way to deal with k/kmax is actually to do (kmax - k)/kmax which would give a decreasing temperature.

I've tried to implement these solutions already but something is still wrong with the cooling even though I'm getting the correct answer for x. I keep getting 1.0, after these changes, which is the minimum of the sum of the schaeffer functions but it's not rejecting correctly as it cools. Something is either up with the prob function(removing the -1.0 gives < 1.0 prob but they're all still high), the distance function, or the initial parameters, or the whole distance from hell thing.

Did you print out the probs? I keep getting > 1.0 for every time the elif is executed.

meneal commented 9 years ago

I'm getting greater than 1 too. It's really

On Thursday, September 17, 2015, Alexander Sobran notifications@github.com wrote:

Yeah, I think the best way to deal with being outside of normalization is not to clip but to wrap(mod). Menzies specifically stated that clipping was a bad mistake that cost him a few years in research. I also don't think running [-1, 1] is useful as theoretically you could get outside those bounds.

I think the best way to deal with k/kmax is actually to do (kmax - k)/kmax which would give a decreasing temperature.

I've tried to implement these solutions already but something is still wrong with the cooling even though I'm getting the correct answer for x. I keep getting 1.0, after these changes, which is the minimum of the sum of the schaeffer functions but it's not rejecting correctly as it cools. Something is either up with the prob function(removing the -1.0 gives < 1.0 prob but their all still high), the distance function, or the initial parameters, or the whole distance from hell thing.

Did you print out the probs? I keep getting > 1.0 for every time the elif is executed.

— Reply to this email directly or view it on GitHub https://github.com/LambdaConglomerate/x9115lam/issues/8#issuecomment-141074574 .

meneal commented 9 years ago

Sorry about that. Having to email from my my phone. I did print them to an out file for the whole run they are all around 1, never less than .99 that I see. And that's printing them for all iterations not just the elif branch.

I honestly think it's just the probability function that's broken. We're actually finding solutions that have that sqrt(2) distance at the end of the run or close to it. It's just not cooling for some reason. I looked at the energy values we're getting and they are all in a reasonable range.

On Thursday, September 17, 2015, Matthew Neal meneal@gmail.com wrote:

I'm getting greater than 1 too. It's really

On Thursday, September 17, 2015, Alexander Sobran < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Yeah, I think the best way to deal with being outside of normalization is not to clip but to wrap(mod). Menzies specifically stated that clipping was a bad mistake that cost him a few years in research. I also don't think running [-1, 1] is useful as theoretically you could get outside those bounds.

I think the best way to deal with k/kmax is actually to do (kmax - k)/kmax which would give a decreasing temperature.

I've tried to implement these solutions already but something is still wrong with the cooling even though I'm getting the correct answer for x. I keep getting 1.0, after these changes, which is the minimum of the sum of the schaeffer functions but it's not rejecting correctly as it cools. Something is either up with the prob function(removing the -1.0 gives < 1.0 prob but their all still high), the distance function, or the initial parameters, or the whole distance from hell thing.

Did you print out the probs? I keep getting > 1.0 for every time the elif is executed.

— Reply to this email directly or view it on GitHub https://github.com/LambdaConglomerate/x9115lam/issues/8#issuecomment-141074574 .

aisobran commented 9 years ago

So the probability function shouldn't be multiplied by a factor of -1.0 because we are actually maximizing instead of minimizing. Menzies reference the alg from the linked paper in the notes and it's based on minimizing while in our implementation since we're measuring distance from hell we are maximizing.

ghost commented 9 years ago

I think I figured out what the problem was. In Menzies' code, he says P(e, en, k/kmax) > rand(), but further down the page he says "That is, we jump to a worse solution when "random() > P" in two cases." The second statement is correct, that the random number has to be greater than the probability function in order to jump to a worse solution. I changed the direction of the sign in the code, and now there are fewer question marks appearing later on.

Now with that seemingly fixed, tt looks like we need to adjust our cooling schedule, since we are iterating on even though we have not found any better solutions. Maybe lower kmax or emax by a bit?

aisobran commented 9 years ago

I think this is good to go. We should submit

ghost commented 9 years ago

Looks fine to me. I'm not sure if we want to do any optimizations though. Any thoughts @meneal?

meneal commented 9 years ago

I'll go ahead and submit this now. It's interesting that now since the probability function is working correctly we're actually not finding the optimal solution, where before when it was bouncing around randomly it was hitting really close to sqrt(2) every time and often stopping before it ever reached kmax. But now it runs to kmax every time and never reaches sqrt(2) in the runs I'm looking at.

On Tue, Sep 22, 2015 at 3:24 PM, Joseph Sankar notifications@github.com wrote:

Looks fine to me. I'm not sure if we want to do any optimizations though. Any thoughts @meneal https://github.com/meneal?

— Reply to this email directly or view it on GitHub https://github.com/LambdaConglomerate/x9115lam/issues/8#issuecomment-142391942 .

meneal commented 9 years ago

I just thought of this too. Prob should have some sort thing in there for a readme. I'll add in a screenshot of the last part of a run too.

aisobran commented 9 years ago

We are finding the best solution. I keep getting 0.9999 which is right on the dot as the best solution is 1. We were previously getting very wrong solutions that were negative. I think you may be confusing the energy of a solution with the value of the variable.

aisobran commented 9 years ago

I really think before the normalization function was right because we normalized for value [0,1] while the neighbor function, which was changed, didn't account for going <0 or >1, so the normalization was being incorrectly applied outside of it's bounds [0,1].

meneal commented 9 years ago

The best solution value is the x value that creates the closest to optimal energy value. But we have set the closest to optimal energy value to be emax or sqrt(2). I totally get what you're saying about the negative values. I suppose it may not be possible to reach an energy of sqrt(2) based on the restrictions. To reach that energy value, both f1 and f2 would need to return zero.

aisobran commented 9 years ago

Exactly

meneal commented 9 years ago

I guess using impossible values makes optimizing easier. :)

aisobran commented 9 years ago

We find the pareto frontier with our energy value.

aisobran commented 9 years ago

lol

meneal commented 9 years ago

submitted

LambdaConglomerate / x9115lam

HW4 - Simulated Annealing #8