LambdaConglomerate / x9115lam

2 stars 0 forks source link

Final Project #17

Closed aisobran closed 8 years ago

aisobran commented 8 years ago

We should have a proposal by Tuesday. During class we talked about adding novel heuristics to PSO. I think it's a great idea. We should also consider what models we want to choose. I suggest everything in moea problems. We could also use it to tune machine learning algorithms. This would be pretty straight forward as we could use sci-kit and a data set from kaggle.

Another alternative would be to code up gale and apply these models.

ghost commented 8 years ago

The reason I updated the objective max/min on 137 is that we might have had to generate new positions if the candidate is either out of bounds or out of constraints. I don't think the objective values are valid if their of these conditions aren't met. Matt, why do you suggest moving the update to line 117? I'm just curious.

And Sasha, really good work on the graphs! They make it a lot easier to visualize the candidates and the decision space, and would be really helpful when writing our report.

I do have a question though. In tanaka_energiest.png, it seems there are quite a bit of candidates with negative energy. These are actually the objective values before normalization, correct? I just want to clarify because "energies" is mentioned.

aisobran commented 8 years ago

The reasoning for the location of updating objective max min is correct.

The energies cannot be negative. What you are seeing are negative objective values which shouldn't be negative as they are normalized.

aisobran commented 8 years ago

Just pushed a troubleshoot for this. All the graphing is commented out for right now.

The objective max, mins are not updating correctly. You can see the output at the bottom of console.txt.

Still diving deeper into this.

meneal commented 8 years ago

I think I may have fixed this. I'm not going to push it since I don't want to mess up what you've done over the past hour. I just commented out a bunch of print lines just to get rid of a bunch of debug shit I had in there. Commented out: 143-145, 179, 182-183. Then I added an additional update on line 117: model.updateObjectiveMaxMin(can.pos)

I'm getting no output of lesser or greater for objectives at that point.

I think that what's happening is that if you don't update on line 117 you lose whatever objective values you may have gotten from the movement in line 116. Obviously some of that movement is going to be out of bounds since the decisions will take on values that are out of both bounds and constraints. The bad thing is that I would think that would artificially inflate values that we'll get for energy using this rig. That said running this way I got a really similar value for energy from running with that check or without that check. Also it works.

meneal commented 8 years ago

A quick test with Fonseca and Tanaka has values less than min for a few objectives But they're all within about 0.05 of the minimum value. Tanaka has severely reduced performance though, or at least it appears so since we're adding in those values that are out of bounds. I'm getting 0.64 as an global best for it.

meneal commented 8 years ago

One thing I just thought of though is that it's pointless to do separate checks for check constraints and check bounds. It may be possible to create two new vectors by having separate checks, not very probable though. Just deleting the if not model.checkConstraints(can.pos): line changing the first if to if not model.checkBounds(can.pos) or not model.checkConstraints(can.pos): Is at least more succinct if not potentially better.

Just figured it out. You're adding invalid vectors to the list that you're graphing! Instead of adding the vector on line 116 you should be adding after the check. It's no wonder why we get working energy when the update is made before bounds are checked and then get fucked up energy when the update is made after bounds and constraints are checked. Let me know if I can push the fix.

There are still little fuck ups in Fonseca, but they're all less than 0.2 of fonseca being less than the min.

aisobran commented 8 years ago

I came to the same conclusion. I'll push the fix up with updated energy graphs, I'll also rename the graph to objectives because I think ti's causing confusion.

aisobran commented 8 years ago

Ok pushed with new graphs. We can clearly see the pareto frontiers now. I'm going to take a break for a little bit and get back to this in the afternoon.

meneal commented 8 years ago

Agreed that the change to objectives is better. I know that they are objectives, but every time I open the file I think energy. It's really interesting to see them hit that wall that is the frontier. Seems like we pretty much have a working implementation at this point.

aisobran commented 8 years ago

So I looked this over and compared it to the lit. Everything looks good except I don't see an intertia constant. But that makes sense, explained below.

It looks like you are implementing the adaptive PSO by Clerc and Kennedy(2002) with constriction factor(here called k, chi in the lit) and not the classical PSO by Kennedy and Eberhart which contains an inertia constant and no constriction factor. Correctly, the constriction factor is implemented with no intertia constant. But I think our baseline should be classical PSO, with adaptive PSO (already implemented) as another comparator.

The main difference being that there is no constriction factor in the update and the previous velocity should be scaled by some non-negative inertia factor.

I can implement this if you guys agree.

meneal commented 8 years ago

Yep, that's the main difference. I only implemented this way because of the suggestions from the off the shelf pso paper. It suggests that this is better than the older inertia methodology. I'm totally fine with using inertia as a baseline if you'd like. Constriction is referred to as K in the literature as well as chi, K is what they call it in the off the shelf paper. I read the initial Kennedy paper, but have totally forgotten what they used for an inertia calculation.

One thing I would say though is that we could make a case for just using the off the shelf implementation as a baseline and then just going from there. They've already made the case that classical PSO is inferior to adaptive, so I'm not sure that we need to bother doing that work again. I guess if you think it won't take you much time and you really feel like it's worth it go right ahead, I'm just not sure that it adds that much. To some degree going with the older weaker version will inflate our relative success!

We would likely need to implement some hard bounds for XMax/Xmin and VMax/Vmin to make it work. Theoretically constriction is helping to keep our velocity values lower than they would be otherwise, but I don't think that the inertia value will do anything to help that.

On Sun, Nov 15, 2015 at 6:25 PM, Alexander Sobran notifications@github.com wrote:

So I looked this over and compared it to the lit. Everything looks good except I don't see an intertia constant. But that makes sense, explained below.

It looks like you are implementing the adaptive PSO by Clerc and Kennedy(2002) with constriction factor(here called k, chi in the lit) and not the classical PSO by Kennedy and Eberhart which contains an inertia constant and no constriction factor. Correctly, the constriction factor is implemented with no intertia constant. But I think our baseline should be classical PSO, with adaptive PSO (already implemented) as another comparator.

The main difference being that there is no constriction factor in the update and the previous velocity should be scaled by some non-negative inertia factor.

I can implement this if you guys agree.

— Reply to this email directly or view it on GitHub https://github.com/LambdaConglomerate/x9115lam/issues/17#issuecomment-156868758 .

aisobran commented 8 years ago

You're right that the constriction factor removes the need for Vmax and the inertia factor and helps reach convergence and is widely agreed as a good method to avoid optimizing Vmax and inertia. But there is also research that shows that this quick convergence inherent to global adaptivePSO may not yield diverse results. (http://users.cecs.anu.edu.au/~ejmontgomery/publications/2011-12_convctrl_pso.pdf) I think we are actually seeing this if you look at all the clumping occurring the graphs.

I like keeping adaptive PSO as a comparator and as a base PSO to add our heuristics. But I'm not so sure we should take the result of "adaptive PSO is the best" from the off-the-shelf paper as gospel. They test on 5 models, each with single variant for the objective, and the no free lunch theorem would lead me to believe that there will be models where adaptive PSO will perform worse (it doesn't seem like we share any models with OTS-PSO). Additionally, implementing classical PSO will give use two more variables to tinker with when applying heuristics. Other PSO implementations have had success with this including adaptive fuzzy PSO(Eberhart and Shi).

Also, we should consider how the constriction factor will affect the implementation of our heuristics. Do we want the prey's convergence to be function of the constriction factor when being chased away or do we want the ability to be chased away at greater speeds if the predator gets close? Also, for the second heuristic we want greater repulsion as each particle gets bigger. This will be hard to achieve solely by C1 and C2 as they will be scaled back down by chi. I guess another option is to add a third term the equation that is outside chi's scaling but it's difficult to foresee if chi's utility is degraded by this.

We should also consider how the neighborhood topology will affect convergence. Kennedy has shown that a local neighborhood converges more slowly(requiring more steps) but has higher probability of finding the desired solution than using global neighborhoods for complex problems (https://svn-d1.mpi-inf.mpg.de/AG1/MultiCoreLab/papers/KennedyMendes02%20-%20Population%20Structure%20and%20PSO.pdf). This actually makes sense as complex problems will most likely have many local optima that a global neighborhood implementation will converge to. Once we implement one of Menzies' nasties I think this will pop up.

For right now should probably split pso.py to: adaptiveGlobalPSO.py classicalGlobalPSO.py

I can alter adaptivePSO into classical. It should be easier to split these up than to have an option in pso.py, at least at first.

From here we should implement our heuristics giving us 6 different PSOs.

Then time-permitting we should try implementing a local neighborhood variation.

ghost commented 8 years ago

It would be nice to experiment with these different heuristics, but I would recommend talking to Menzies tomorrow and seeing what he thinks and if this is in scope for a masters level project. Especially since we will not only have to implement these heuristics but test them on some nasty models.

aisobran commented 8 years ago

Classical global PSO was added you can see the comparison of the results in the out.txt. I'm putting the graphs together now.

aisobran commented 8 years ago

Pics up. Looks like the less convergent behavior is pretty evident vs adaptive.

meneal commented 8 years ago

I would've responded earlier today but I've been poking around at some other stuff. I completely agree with you about the fact that we shouldn't consider OTS PSO to be gospel. For whatever reason that didn't really pop off to me initially. It's also awesome of you to post those papers that give some background about why classical may be of use. I was planning on reading the papers earlier today, but just didn't get a chance.

I think that the plan you've put forward makes sense. 2 heuristics and one adaptive and one classical for each. I'm not too worried about scope with that. We'll see about local neighborhood. I doubt we'll get to it just because I think it could take a long time to implement depending on how we go about it.

As far as the two heuristic applications and whether they'll be affected by choosing classical vs adaptive, I might take a look back at the two papers I posted towards the beginning of this thread. I've forgotten whether they used the constriction factor or used inertia. I'll look at that tomorrow and post. Might at least give us a bit of a signpost on where to start implementing heuristics with this setup.

Looking at the images. It's interesting that there is a huge difference between the objectives outputs for some models (Sriinvas, Viennet3,Viennet2) , but then some models have barely any differences (Viennet4, Constr_ex). For a Viennet 2 and 3 it is possible to see in the graphs a much more delineated frontier with features that would be missed totally by adaptive. Good work on that shit man.

aisobran commented 8 years ago

Just pushed an initial implementation of heuristic no. 2 in classical global PSO. Output looks much different than before. Still trying to tinker with it and there are some issues with deleting candidates as they get combined. You can see some of the pics.

I noticed a bug in the code. I'm not sure if this was in there initially or I accidentally uncommented the line but at line 191 of adaptive, it looks like we are generating a whole new set of candidates at the end of every loop. I didn't see this an any implementation of PSO. I didn't remove it yet just wanted to make sure it's not supposed to be there.

meneal commented 8 years ago

Line 191 is to regenerate a clean slate at the beginning of a new retry. It's hard to see though because of the indentation. You can see though above it on line 184 that st.k is decremented and then we indent out, for the outer loop. We haven't really been using retries so it hasn't mattered, but I just put it there for once we start running with retries.

As far as the setup for 185 through 223, is there any particular reason you didn't include this stuff within the block above it? Is your intention to move the particles first and then do the repulsion? The only reason I ask, is that at that point we essentially move the particles twice in each timestep. At least that ones that are within range of another particle. I guess that could be ok, I'm just not totally sure.

Perhaps it could be better to aggregate all of the forces on each particle first and then change the position of the particle?

meneal commented 8 years ago

Nice thing about aggregating too is that we could potentially weight each aspect that adds in to the velocity of the particle and tweak that to get different results.

aisobran commented 8 years ago

That makes sense about the retries.

We definitely should compute everything together. But I wanted to separate it out right now to be able to trouble shoot. There's a lot of assumptions in the calculation and right off the bat I can tell it's probably undershooting. In a couple of iterations it will be combined.

ghost commented 8 years ago

More models: http://jmetal.sourceforge.net/problems.html DTLZ: http://www.tik.ee.ethz.ch/~sop/publicationListFiles/dtlz2001a.pdf

meneal commented 8 years ago

here is the repo for the metrics. From what I can see we will need to create two files from a run and place them in specific directories, then run the scripts to pull out hypervoulme/spread metrics if we use the current setup for the scripts.

We could rip the scripts apart and add them into our current setup, but it's really complicated enough that I'm not sure that it would actually be worth it to rip them apart. I'll try to add them as a subdirectory to project and then setup a shell script to automate running our rig and then running the metrics. I'll poke around at that this morning for awhile. I'll work in the original adaptiveGlobal to start, and setup a frontier in there. Some of this may well necessitate moving the can class into state, not the worst thing for sure, but not exactly what I wanted to do either.

One other thing is the double loop we talked about a bit yesterday for cdom. For these small models we aren't getting any zero diff loss amounts, but that could well change once we add in these fatter models. There's a block in model.py for at least creating output if two cans have similar loss within some epsilon. That block is between 113 and 121. It doesn't do anything any differently if they're within the epsilon it just spits out some output. Whenever we upgrade to the fatter models we should remember to turn that on just to check and see if something is happening there.

I'll post more when I actually get some shit done.

meneal commented 8 years ago

I just realized that calculation of objectives as I'm using them in our case relies on having the updates for the objective max min. I'm using the function on line 109 of model.py, which uses the normalized objective output. For that to work we obviously need to continuously update the objectivemaxmins or else that value won't be valid. So we'll need to make sure to keep that stuff in regardless of whether we actually jettison energy or not.

All of the metrics algorithms as they're set up take only one frontier set and calculate hypervolume and spread, so for now what I'll try to do is take a one retry frontier and get the metrics working, then after that I'll use the median of multiple runs as the value to calculate the metrics against.

ghost commented 8 years ago

I noticed that the pareto frontiers in the metrics were in the same format as the reference pareto fronts on the jMetal site, so it looks like they're in a standard format. I don't think it'll be difficult to create these output files for our candidates.

Also, I'm going to use the wiki to keep links to our project papers like you suggested with some notes as to what each paper is about. We could also use the wiki to store links to our readings since they'll be useful to have later on.

meneal commented 8 years ago

Median was a dumb idea. Going to run cdom against each individual particle's best for each run and then take the dominator out of each run for each particle. Then we sort of have a best of the best frontier.

ghost commented 8 years ago

Just pushed an implementation of DTLZ1. DTLZ has a customizable number of objectives and I didn't want to fit each objective into a lambda so I tried to generate them dynamically. I probably made some mistakes, so have a look over it and feel free to condense the objectives if it's possible and still easy to understand.

I did an initial run of DTLZ1 and it seems we're getting really low energy values. I'm a bit confused though since the paper says that cans with all zeros are optimal but we are not getting such values in our global best nor in our frontiers. I'm just assuming something is wrong with the implementation. Let me know what you guys think.

meneal commented 8 years ago

I haven't had a chance to look at dtlz yet. But I did want to post on what I've done so far with hypervolume and spread. I have working output at this point. Basically I pull in and do cdom through all of the particles over all of the runs and then pull out a final frontier, and then I write that frontier out to two txt files in the metrics directory and run the script for spread and the script for hypervolume.

If you want to give it a shot, just run sh run.sh and you'll get the output. I only have the optimal frontiers in there for Fonseca, and Tanaka at this point. As they come out of JMetal they're messed up. Basically they have tabs where they should have spaces. I have a script called wrangler.py in the spread folder to deal with them, but I can knock those out as we need them. Take a look and let me know what you think.

I was thinking of adding back in some of the grapher stuff where I could run grapher against the final frontier I'm getting out of the optimizer. Just haven't quite gotten that far yet.

aisobran commented 8 years ago

I just realized there is a bug in the probability implementation. We can end up with negative probs and >1 probs. I'll implement a fix soon, just thought I'd let you guys know. I have a better way to adjust the probabilities by scaling.

meneal commented 8 years ago

There was actually a huge evil bug in cdom that took me most of this morning to fix. I think it's right now. I'll need to push before 11:30 whether it's fixed or not. Hopefully it's ok at that point.

On Thu, Nov 19, 2015 at 10:44 AM, Alexander Sobran <notifications@github.com

wrote:

I just realized there is a bug in the probability implementation. We can end up with negative probs and >1 probs. I'll implement a fix soon, just thought I'd let you guys know. I have a better way to adjust the probabilities by scaling.

— Reply to this email directly or view it on GitHub https://github.com/LambdaConglomerate/x9115lam/issues/17#issuecomment-158095283 .

meneal commented 8 years ago

So I fixed my bug, but for the moment, the only optimizer that will run is adaptive. I'll fix the others later, this is just because cdom has been changed to return values other than 0/1 in the case that we are some epsilon of loss in difference. I'm hoping I might be able to talk to Rahul after Data Sci today to get a bit of help to make sure I'm doing some of this right.

ghost commented 8 years ago

I pushed my attempts at DTLZ2 and 3, but I noticed a weird bug. If I set m for DTLZ1 to 2 and set m for DTLZ3 to 10, I get a "list index out of range" error on line 218 in Models.py. It turns out that len(x) is correct at 2, but i is 9, which is one less than DTLZ3's m value. So it seems that DTLZ3 is somehow changing the variables of DTLZ1.

My intuition tells me this has to do with the lambdas that I'm trying to generate for example on line 209. They add just fine with the correct i values, but when they are called later i gets another value. So I'm thinking we need another way of doing this. Do you guys have any suggestions?

aisobran commented 8 years ago

Is passing in i from the for loop causing side effects?

ghost commented 8 years ago

I think so, but I thought all the instances of i are local and should not cause side effects?

aisobran commented 8 years ago

I'll take a deeper dive into it.

All the models were derived from : http://www.tik.ee.ethz.ch/~sop/publicationListFiles/dtlz2001a.pdf ?

ghost commented 8 years ago

Yeah. I tried a DTLZ1 m of 5 and it didn't crash, even though DTLZ 3's m is 10. So I'm sure there is some side effect happening but not sure exactly where.

ghost commented 8 years ago

Just pushed the fix to DTLZ from class and added DTLZ4. The only addition to DTLZ4 is an alpha exponent. The rest of the problems are more complicated so I didn't want to get started on them right now. I'll definitely get around to them by next weekend when I get back from vacation, otherwise if any of you want to implement them feel free.

If I get time, I'll try to start working on a really rough draft of the intro/related work, but I can really get started on that once I get back next weekend.

I'll have my laptop with me so I can do some small stuff if necessary though.

meneal commented 8 years ago

I've added all of the True frontiers now to the True_PF folder in spread. They should all be in the correct format. I added all of the DTLZ three dimensional frontiers instead of using the 2d ones. We could certainly add the 2d as well if necessary.

I'm going to try to do a bit more testing to see how things are working.

meneal commented 8 years ago

There's an issue with a few things right now.

ghost commented 8 years ago

I just pushed a short beginning to our report. It has Introduction and Related Work sections. I'll flesh these sections out over the course of the weekend.

ghost commented 8 years ago

I'm trying to find a reference for the adaptive PSO implementation. The off-the-shelf paper cites "The swarm and the queen: towards a deterministic and adaptive particle swarm optimization" as the one which introduces the constriction factor. Sasha, is this the paper you were referencing when you said "It looks like you are implementing the adaptive PSO by Clerc and Kennedy(2002)..."? The paper I found only has Clerc as an author.

aisobran commented 8 years ago

This is the paper I was referencing: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=985692&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F4235%2F21241%2F00985692

You have to get it through ncsu.lib

ghost commented 8 years ago

Thanks Sasha. adaptiveGlobalPSO.py looks similar to the paper you referenced, so I'm thinking we can use your paper as the reference for our adaptive implementation.

Just a note, it looks like the inertia weight factor was introduced in this paper by Shi and Eberhart. I've added a reference to our report and pushed it. Right now I have divided the related work section into subsections just for clarity, but we can get rid of them later. I'm going to incorporate Sasha's paper into the report, and also a paper about the probability implementations. Is there reference for those you have handy?

aisobran commented 8 years ago

The probability implementation is our own heuristic. It basically determines which particle the current particle has had the most success following and sets that as gbest. I can write up the equations used.

meneal commented 8 years ago

So a few things on this. I think we've been really pretty disorganized about how we're going about this and I feel like if we actually want to get it done and get it done in some more sensible way we're going to have to do a bit better for the next little while.

My thought is we should actually stop posting in this one thread and instead come up with a list of specific features and issues that we want to complete to get this done and have separate github issues for each. Otherwise, like I said above I think we're going to have problems getting done.

So I have a set of things that I think we need and I'll post each into a separate issue and we can discuss each of them in some sort of individual way and we can actually assign someone to take care of it and not be working in a bunch of different directions at once.

I think a lot of this is my fault, but I also think that we were in a position before this where each homework was so short and really wasn't easy to split up in some sort of sensible way. So we didn't have to get organized because one person could do a bunch of work and then another person could step in and do a bunch of work and we could just post what we were doing up here. But I think that the rig is big enough at this point that we're beyond that at this point.