ppowerset containing redundant regions?

jmct commented 8 years ago

If you run the latest build of prob with the following invocation:

./prob --domain box --samples 1 ../coalition-examples/demo-1d-single.prob

You get the following revised belief:

PPowerset
PStateset (p=[1/1010045120210252210120045010001,250/1008028056070056028008001],s=[0,1010045120210252210120045010001],m=[0,2],a/r=[0,0]): Stateset of 1010045120210252210120045010001 states, 10 d
imensions
                        -1*(ship1_berths)>=-1000        1*(ship1_berths)>=0     -1*(ship1_loc)>=-1000   1*(ship1_loc)>=0        -1*(ship4_berths)>=-1000        1*(ship4_berths)>=0     -1*(ship2
_berths)>=-1000 1*(ship2_berths)>=0     -1*(ship5_loc)>=-1000   1*(ship5_loc)>=0        -1*(ship3_loc)>=-1000   1*(ship3_loc)>=0        -1*(ship4_loc)>=-1000   1*(ship4_loc)>=0        -1*(ship5
_berths)>=-1000 1*(ship5_berths)>=0     -1*(ship2_loc)>=-1000   1*(ship2_loc)>=0        -1*(ship3_berths)>=-1000        1*(ship3_berths)>=0

PStateset (p=[1/1010045120210252210120045010001,751/1008028056070056028008001],s=[0,1010045120210252210120045010001],m=[0,2],a/r=[0,0]): Stateset of 1010045120210252210120045010001 states, 10 d
imensions
                        -1*(ship1_berths)>=-1000        1*(ship1_berths)>=0     -1*(ship1_loc)>=-1000   1*(ship1_loc)>=0        -1*(ship4_berths)>=-1000        1*(ship4_berths)>=0     -1*(ship2
_berths)>=-1000 1*(ship2_berths)>=0     -1*(ship5_loc)>=-1000   1*(ship5_loc)>=0        -1*(ship3_loc)>=-1000   1*(ship3_loc)>=0        -1*(ship4_loc)>=-1000   1*(ship4_loc)>=0        -1*(ship5
_berths)>=-1000 1*(ship5_berths)>=0     -1*(ship2_loc)>=-1000   1*(ship2_loc)>=0        -1*(ship3_berths)>=-1000        1*(ship3_berths)>=0

Notice that these are identical except for pmax, where the numerator is 250 in one and 751 in the other. What's going on here? Shouldn't these be merged into one?

If you increase precision to 3 you get the following:

revised belief
PPowerset
PStateset (p=[1/1010045120210252210120045010001,250/1008028056070056028008001],s=[0,1010045120210252210120045010001],m=[0,2],a/r=[0,0]): Stateset of 1010045120210252210120045010001 states, 10 d
imensions
                        -1*(ship1_berths)>=-1000        1*(ship1_berths)>=0     -1*(ship1_loc)>=-1000   1*(ship1_loc)>=0        -1*(ship4_berths)>=-1000        1*(ship4_berths)>=0     -1*(ship2
_berths)>=-1000 1*(ship2_berths)>=0     -1*(ship5_loc)>=-1000   1*(ship5_loc)>=0        -1*(ship3_loc)>=-1000   1*(ship3_loc)>=0        -1*(ship4_loc)>=-1000   1*(ship4_loc)>=0        -1*(ship5
_berths)>=-1000 1*(ship5_berths)>=0     -1*(ship2_loc)>=-1000   1*(ship2_loc)>=0        -1*(ship3_berths)>=-1000        1*(ship3_berths)>=0

PStateset (p=[1/1010045120210252210120045010001,375500/1009036084126126084036009001],s=[0,1010045120210252210120045010001],m=[0,1],a/r=[0,0]): Stateset of 1010045120210252210120045010001 states
, 10 dimensions
                        -1*(ship1_berths)>=-1000        1*(ship1_berths)>=0     -1*(ship1_loc)>=-1000   1*(ship1_loc)>=0        -1*(ship4_berths)>=-1000        1*(ship4_berths)>=0     -1*(ship2
_berths)>=-1000 1*(ship2_berths)>=0     -1*(ship5_loc)>=-1000   1*(ship5_loc)>=0        -1*(ship3_loc)>=-1000   1*(ship3_loc)>=0        -1*(ship4_loc)>=-1000   1*(ship4_loc)>=0        -1*(ship$
_berths)>=-1000 1*(ship5_berths)>=0     -1*(ship2_loc)>=-1000   1*(ship2_loc)>=0        -1*(ship3_berths)>=-1000        1*(ship3_berths)>=0

PStateset (p=[1/1010045120210252210120045010001,25050/91730553102375098548728091],s=[0,1010045120210252210120045010001],m=[0,1],a/r=[0,0]): Stateset of 1010045120210252210120045010001 states, $
0 dimensions
                        -1*(ship1_berths)>=-1000        1*(ship1_berths)>=0     -1*(ship1_loc)>=-1000   1*(ship1_loc)>=0        -1*(ship4_berths)>=-1000        1*(ship4_berths)>=0     -1*(ship$
_berths)>=-1000 1*(ship2_berths)>=0     -1*(ship5_loc)>=-1000   1*(ship5_loc)>=0        -1*(ship3_loc)>=-1000   1*(ship3_loc)>=0        -1*(ship4_loc)>=-1000   1*(ship4_loc)>=0        -1*(ship$
_berths)>=-1000 1*(ship5_berths)>=0     -1*(ship2_loc)>=-1000   1*(ship2_loc)>=0        -1*(ship3_berths)>=-1000        1*(ship3_berths)>=0

Which appears to be the same problem. This pattern continues as you increase precision but maxes out at 4 (so unbounded precision just has 4 of these redundant regions).

Any thoughts?

piotrm0 commented 8 years ago

It would take some computation to realize when regions can be merged. Along with this, there are other heuristics to use when deciding which things to merge when you really have to.

Is this causing a problem in the conclusions you are getting?

piotrm0 commented 8 years ago

Maybe a first realization is that the powerset abstraction does not keep polyhedra disjoint. This is for computational and precision reasons. If the issue you posted happens a lot to you, perhaps you can code up a method to merge identical regions to keep their number lower?

jmct commented 8 years ago

So, in general I don't think it's an issue, it's that we're using the 'size' of a belief to get some statistical bounds on our confidence from sampling. So when this happens it's looking like our beliefs are 'bigger' and our confidence about our sampling goes down, even though the size in this case is just the size of one of these regions (I hope that makes sense).

jmct commented 8 years ago

So maybe right before sampling I'll write a merge_regions function that finds these and merges them safely. I'm guessing the safe thing will be to take the lower of the two p-min values and the greater of the p-max values.

Mostly I'm glad this is expected bahaviour and not an unseen bug.

piotrm0 commented 8 years ago

The second point is that only when the vulnerability calculation is done, are disjoint regions computed. See ppowersetmaker.ml max_belief .

piotrm0 commented 8 years ago

That is not sound. You have to take the lower of the two p-mins and the sum of the two p-maxes . You can do better in some edge cases.

piotrm0 commented 8 years ago

Or you can just use the already implemented abstract plus which takes care of these edge cases for you. See pstateset.mli abstract_plus .

piotrm0 commented 8 years ago

Perhaps the confusion stems from misunderstanding of what the abstraction is. The powerset domain here should be interpreted as a sum (as in sum of distributions) instead of a disjunction like in typical abstract interpretation.

jmct commented 8 years ago

Yeah, I think that was my issue (better late than never on me figuring that out).

So, if I'm understanding you correctly I can call max_belief right before sampling and it will ensure that the regions are disjoint?

piotrm0 commented 8 years ago

No, that will just give you a real value.

Looks like the computation in max_belief isn't as trivially adaptable to your problem as I thought. It would require some though. Perhaps for now you can do a heuristic that you suggested that does abstract plus on pieces that cover identical regions.

jmct commented 8 years ago

Hmm, I don't even think I need to worry about combining them really (not for now, at least) since the sampling only happens at the end and it's the size we care about. i.e. We just don't want identical regions to unnecessarily add to the size.

jmct commented 8 years ago

I'm wrong here. We do have to combine them. So I'll try implementing the heuristic that uses abstract_plus after I go eat some dinner.

I'll attach a patch to this issue so that you can check that I haven't done anything too silly if you have the time later.

Thanks for your help!

jmct commented 8 years ago

Piotr, does this look reasonable?

I tried to leverage your regions_are_disjoint function to do most of the work.

piotrm0 commented 8 years ago

It looks like you are reporting regions that have some overlap as equal. Is this ok for your application?

jmct commented 8 years ago

nope. I actually thought partial overlap wasn't possible from a conversation I had with Stephen this last week. Your comment implies that it is, so I'll have to figure out another way.

Do you know of a quick way to see if all the bounds are the same? I tried looking through the APIs

piotrm0 commented 8 years ago

If you use statesets_exact_intersections of pplstatesetmaker you can get a set of non-overlapping regions.

jmct commented 8 years ago

awesome, thank you so much Piotr! I owe you a beverage of some kind :)

jmct commented 8 years ago

so should I instantiate that type at (stateset * stateset) list -> (stateset list) list

piotrm0 commented 8 years ago

No, the inner list will be a set of input statesets. One element of the outer list represents one disjoint region though that region is not directly given. Looking into how to recover that now. Also added some documentation to master branch.

piotrm0 commented 8 years ago

Try statesets_get_exact_intersections (note the "get") of pplstatesetmaker that I just committed to master .

piotrm0 commented 8 years ago

Ops, I was committing to the plum-umd repo. Just a second.

piotrm0 commented 8 years ago

Ok, I added it to this repo's dev branch.

jmct commented 8 years ago

Hey Piotr, I'm starting to look at this again and now that I'm not in a rush for a demo I want to make sure I'm doing it correctly.

The function you added stateset_get_exact_intersections takes a list of statesets and returns a list of pairs (s, [ss]), where the first element of the pair is a stateset s and the second element is a list of statesets that overlap with s.

If I map pair_first over the resulting list I get a list of disjoint statesets. But because we are dealing with statesets and not pstatesets, we lose all of the probabilistic information about a region, which would have been okay for the demo but now we're looking to sample multiple times while processing a query, so we'd need to be able to update the estimator part of a pstateset.

So now that I'm looking at it with fresh eyes, I better understand what I'd need, but am unsure how to accomplish it. Any chance you're free sometime soon for a quick google hangout so that we can discuss it?

piotrm0 commented 8 years ago

I think this might work:

Given a list of pstatesets lpss, use stateset_get_exact_intersections on the respective statesets.

For each stateset s in the return list {

Get the intersection of it with each of the given pstatesets lpss. You can use pstateset.mli/intersect for this. This updates estimator.
Prune the ones that are empty.
Use pstateset.mli/abstract_plus to combine them all into one pstateset.

}

Do the same for each of the returned ss's, giving you a collection of pstatesets, each disjoint from another. Then you can either work with them individually or collect them all into a ppowerset.

There is a slight inefficiency here when doing all the intersections. Really you would only need to check the ones that are holding the statesets given in the other half of the returned tuples, but presently this is annoying as you'd have to make an association of them to the enclosing pstateset yourself. Alternatively, create a version of stateset_get_exact_intersections that lets you specify an extra element to associate with each input stateset like statesets_exact_intersections does. So the type would be

(stateset * 'a) list -> ((stateset * ('a list)) list)

jmct commented 8 years ago

Thanks Piotr, I'll try to code this up this weekend and see what I get.

Also, will you be at UMD for the reading group on Thursday?

GaloisInc / TAMBA

ppowerset containing redundant regions? #4