endjin / Z3.Linq

LINQ bindings for the Z3 theorem prover from Microsoft Research.
MIT License
29 stars 3 forks source link

Approach advice requested #10

Open WhitWaldo opened 1 year ago

WhitWaldo commented 1 year ago

Do you happen to know of any examples you can point me towards that illustrate an optimization problem given 5 "recipes" that each consume and produce resources? I'm trying to solve for the most "optimal" number of buildings utilizing each recipe to maximize the output of only one of the resources.

I'm trying to work off the price-optimized oil purchasing example, but I'm having a hard time visualizing how to express the constraints.

Here's my whimsical example I'm trying to solve for. I'd appreciate any advice you can give on how to conceptualize this using Z3 and LINQ.

Goal:

Constants:

Variables:

- - Recipe A Recipe B Recipe C Recipe D Recipe E
Input Water 50 50 10 30 30
- Shale Oil 100 100 100 0 0
- Heavy Crude Oil 0 0 0 40 0
- Light Crude Oil 0 0 0 0 30
Output Heavy Crude Oil 0 20 70 0 0
- Light Crude Oil 0 70 30 30 0
- Natural Gas 90 30 20 0 20

The idea is that each recipe takes some amount of input and produces some amount of output. If each refinery can produce only one recipe and I know the amount of water and oil going into the system, what is the most efficient number of refineries of reach recipe to use?

Based on the oil processing example, it yields an output of the number of barrels of either country to purchase, so I'm pretty confident it means I start with this:

using (var ctx = new Z3Context())
{
  var theorem = ctx.NewTheorem<(int refineryA, int refineryB, int refineryC, int refineryD, int refinery E)>();
  //Constraints here - the hard bit
  var result = theorem.Solve();
  //Log to console
}

All that I think I know, so pardon the meandering at this point.

Your example constrains the number of barrels. I have a max of oil and water, so I can put those in here. Let's say 10k oil and 50k water.. but I've got a variable mismatch now as the outputs are supposed to be number of refineries, not number of raw materials. Does this mean I should instead change the output type to something more like the following?

              var theorem = ctx.NewTheorem<(
                  //Recipe A
                  int aWater, int aOil, int aIHeavy, int aILight, int aHeavy, int aLight, int aGas,
                  //Recipe B
                  int bWater, int bOil, int bIHeavy, int bILight, int bHeavy, int bLight, int bGas,
                  //Recipe C
                  int cWater, int cOil, int cIHeavy, int cILight, int cHeavy, int cLight, int cGas,
                  //Recipe D
                  int dWater, int dOil, int dIHeavy, int dILight, int dHeavy, int dLight, int dGas,
                  //Recipe E
                  int eWater, int eOil, int eIHeavy, int eILight, int eHeavy, int eLight, int eGas)>();

Here, I've got the variables split out by recipe letter as the prefix and the name of resource (with IHeavy and ILight for Input Heavy and Input Light, respectively) - but now I don't have anything that returns the number of factories for each recipe.. maybe I need those in the output too? But now we're in the world of being a bit unwieldy.

So I turned towards the Missionaries and Cannibals example - let's store the state for each recipe in a class:

    public class Recipe
    {
        public int FactoryCount { get; set; }

        public int Water { get; set; }

        public int Shale { get; set; }

        public int HeavyIn { get; set; }

        public int LightIn { get; set; }

        public int HeavyOut { get; set; }

        public int LightOut { get; set; }

        public int NaturalGas { get; set; }
    }

And I think the cannibals example makes sense through the end. Starting with the initial state, all the missionaries and cannibals are on the starting beach. Given a length constraint, you're setting where statements on each increment to reflect the valid states of cannibals or missionaries on either side, and then you've got a final goal statement where there's no one on the starting beach any longer and some number of turns have been taken.

That makes sense, but I wonder again how to apply it to my scenario. That one optimizes for lowest length of the MIssionaries array, but I'm optimizing for the number of factories for each recipe. I could have each recipe maintain a state of the total amount of water/shale oil remaining, but as the solver is supposed to tell me how many refineries there are, I don't know how many resources to remove from each set of resources given to each refinery..

Anyway, I have a world of problems here figuring how how to apply the solver to my problem. Any direction you can give here would be greatly appreciated (and happy to write it up and contribute the solution back to the examples once it's working). Thank you!

WhitWaldo commented 1 year ago

Even just trying to solve this for a single recipe (starting with A), I'm running into a wall of how to solve it using Z3. With pencil and paper, I know the following:

50 Water + 100 Oil = 90 Gas I have 10,000 Oil which means I can run the recipe 100 times (offhand, it occurs to me that maybe I need a refinery count constraint as well) 100 operations * 90 Gas = 9,000 Gas using 10k Oil and 5k Water.

But putting that into code, I get:

using (var ctx = new Z3Context())
{
    var theorem = from t in ctx.NewTheorem<(int waterA, int oilA, int gasA)>()
        where t.oilA > 0 && t.oilA <= 10_000 // Only 10k oil available
        where t.waterA > 0 && t.oilA <= 50_000 // Only 50k water available
        where t.gasA > 0 && t.gasA == 90 * (t.waterA / 50 + t.oilA / 100) // Each use of 50 water and 100 oil yields 90 gas
        orderby t.oilA descending
        select t;

    var result = theorem.Solve();
    Console.WriteLine($"Water: {result.waterA}, Oil: {result.oilA}, Gas: {result.gasA}");
}

//Water : 1, Oil: 10000, Gas: 9000

The gas is right, but the water isn't. Further, I realized that I had the orderby wrong- I won't want to optimize by using the most oil, but by producing the most gas:

using (var ctx = new Z3Context())
{
    var theorem = from t in ctx.NewTheorem<(int waterA, int oilA, int gasA)>()
        where t.oilA > 0 && t.oilA <= 10_000 // Only 10k oil available
        where t.waterA > 0 && t.oilA <= 50_000 // Only 50k water available
        where t.gasA > 0 && t.gasA == 90 * (t.waterA / 50 + t.oilA / 100) // Each use of 50 water and 100 oil yields 90 gas
        orderby t.gasA descending
        select t;

    var result = theorem.Solve();
    Console.WriteLine($"Water: {result.waterA}, Oil: {result.oilA}, Gas: {result.gasA}");
}

// Water: 1, Oil: 100, Gas: 90

And that's simply not right at all. Again, I appreciate any help you can shed on this.

HowardvanRooijen commented 1 year ago

Don't worry, it is quick tricking getting into the "zone" of defining these theorems. You're certainly not alone! I'll ask @ElisendaGascon if she has any time to take a look. One gotcha is that Z3 will "explore" the entire problem space, and if you don't have the correct constraints in place, it will come up with completely valid, but in reality, unpractical results.

ElisendaGascon commented 1 year ago

Hi! This is an interesting problem, but I think the constraints might need to be better defined.

First, what are you trying to optimise? It isn't clear in your example if your ultimate goal is to maximise the output of natural gas, or to minimise the number of factories for each recipe (although I don't think this is the case, as you said the number of refineries using each recipe was a variable).

Assuming the goal is to maximise the output of natural gas, it might simplify things to get rid of the "refinery" concept. At the end of the day, we want to find the number of times we need to use each recipe to maximise the amount of natural gas created.

To simplify things for now, say the only resources needed to produce natural gas are water and shale oil. Say we have two recipes only, recipe A and recipe C. Number of times we use recipe A = xA Number of times we use recipe C = xC

Say we have the following resources available: Water = 2 000 Sale oil = 9 000

Then, we need to make sure that the amount of water and shale oil used by both recipes doesn't exceed the amount available. So, our constraints are: Water: 50xA + 10xC <= 2000 Shale oil: 100xA + 100xC <= 9000

Finally, we want to maximise the output of natural gas, which is the following: 90t.xA + 20t.xC Put that into code:

using (var ctx = new Z3Context())
{
    var theorem = from t in ctx.NewTheorem<(double xA, double xC)>()
        where 50*t.xA + 10*t.xC <= 2000
        where 100*t.xA + 100*t.xC <= 9000
        orderby (90*t.xA + 20*t.xC) descending
        select t;

    var result = theorem.Solve();
    Console.WriteLine($"Number of recipes A: {result.xA}, Number of recipes B: {result.xC}");
}
// Number of recipes A: 27.5, Number of recipes B: 62.5

I hope I understood the problem correctly, otherwise let me know!

WhitWaldo commented 1 year ago

Thank you very much @HowardvanRooijen and @ElisendaGascon for the assist! I really appreciate the help in narrowing down how to use this tool.

In thinking about this over the last couple of days, it occurs to me that I really have two problems I'm trying to solve for. On one hand, given the larger "recipe" list, I'm looking for a solution that essentially "weights" each of the recipes and tells me what the optimal combination is of each to match any given constraint on inputs (water and shale oil) and output goals (heavy crude, light crude, and natural gas).

I went and worked out the easier of the problems by hand to get an understanding of what Z3 would yield. When looking to simply yield the maximum amount of petroleum gas possible assuming unlimited water and 10k shale oil, Recipe A is the best as it'll give me 9000 natural gas. Using Recipe B, D, then E yields 8660 natural gas and C, D, then E yields only 7500 natural gas.

I hadn't thought about it as in your solution - number of times the recipe is used. I think that's great because I can separately multiply the inputs against the rounded-up version of those outputs and meet the output goals or round down to meet rigorous constraint bounds, as applicable. Then for each recipe that takes the same input resources, I really need it only to calculate our these usage totals to work out for myself which is the larger and smaller outputs.

But that throws a wrench in the next step. Solving just for the natural gas alone, we can solve for natural gas directly in A, B, C, and E, but it doesn't tell us how much of the other resources we wind up with without going back and working out the math ourselves. If one uses recipe B or C, they also need to use D and E to maximize the natural gas output (since either produces heavy and light oil and that in turn needs to be refined to natural gas).

Going off your example, I'd start with this:

using (var ctx = new Z3Context())
{
    var theorem = from t in ctx.NewTheorem<(double xA, double xB, double xC, double xD, double xE)>()
        where 50 * t.xA + 50 * t.xB + 10 * t.xC + 30 * t.xD + 30 * t.xE <= 50_000 //Water
        where 100 * t.xA + 100 * t.xC + 100 * t.xD + 0 * t.xD + 0 * t.xE <= 10_000 //Shale oil
        where 40 * t.xD <= ???  //Heavy Crude Oil - depends on the output value from recipes B and C
        where 30 * t.xE <= ???  //Light Crude Oil - depends on the output value from recipes B and C
        orderby (90*t.xA + 30*t.xB + 20*t.xC + 0*t.xD + 20*t.xE) descending
        select t;

        var result = theorem.Solve();
} 

But where I'd enter the input constraints for the heavy and light crude oils, each depends on the resulting values from recipes B and C - namely whatever the value is times their respective natural gas outputs.

Can Z3 easily solve these "chained" constraint sort of problems or would I need to realize the dependency ahead of time and solve it piecemeal? For example...

using (var ctx = new Z3Context())
{
    //Get the base recipe counts that yield each of the maximum natural gas values
    var baseTheorem = from t in ctx.NewTheorem<(double xA, double xB, double xC)>()
        where 50 * t.xA + 50 * t.xB + 10 * t.xC <= 50_000 //Water
        where 100 * t.xA + 100 * t.xB + 100 * t.xC <= 10_000 //Shale oil
        where t.xA > 0 && t.xB > 0 && t.xC > 0 //Had to add this as I kept getting 0 as outputs otherwise
        orderby (90 * t.xA + 30 * t.xB + 20 * t.xC) descending
        select t;

    var result = baseTheorem.Solve();
    Console.WriteLine($"A: {result.xA}, B: {result.xB}, C: {result.xC}");

    //We know that result.xB will yield 20x for heavy crude and 70x for light crude
    var recipeBRuns = result.xB;
    var b_heavyCrudeYield = recipeBRuns *20;
    var b_lightCrudeYield = recipeBRuns * 70;

    var recipeCRuns = result.xC;
    var c_heavyCrudeYield = recipeCRuns * 70;
    var c_lightCrudeYield = recipeCRuns * 30;

    //We only have one recipe that accepts heavy crude, so no theorem to solve here as to which is better
    var b_heavyToLightYield = b_heavyCrudeYield / 40;
    var c_heavyToLightYield = c_heavyCrudeYield / 40;

    //And we only have one recipe that accepts light crude, so no theorem needed here either - sum the original light crude values here
    var b_lightToGas = (b_lightCrudeYield + b_heavyToLightYield) / 30;
    var c_lightToGas = (c_lightCrudeYield + c_heavyToLightYield) / 30;

    Console.WriteLine($"Recipes (Natural Gas output) - A: {Math.Floor(result.xA * 90)}, B: {b_lightToGas + Math.Floor(result.xB * 30)}, C: {c_lightToGas + Math.Floor(result.xC * 20)}");
} 
//A: 99, B: 0.5, C: 0.5
//Recipes (Natural Gas output) - A: 8910, B: 16.175, C: 10.529166666666667

That's right for recipe A, but because the initial theorem weights A over B and C, they aren't optimized. I'd instead be better off if I went and tweaked the last where statement for A, then B, then C.

A:

where t.xA > 0 && t.xB >= 0 && t.xC >= 0
//...
//A: 100, B: 0, C: 0
//Recipes (Natural Gas output) - A: 9000, B: 0, C: 0

B:

where t.xA >= 0 && t.xB > 0 && t.xC >= 0
//...
//A: 99.5, B: 0.5, C: 0
//Recipes (Natural Gas output) - A: 8955, B: 16.175, C: 0

C:

where t.xA >= 0 && t.xB >= 0 && t.xC > 0
//...
//A: 99.5, B: 0, C: 0.5
//Recipes (Natural Gas output) - A: 8955, B: 0, C: 10.529166666666667

Problem #1

But neither B nor C are right. They should ideally be 8660 and 7500 respectively. Rather, the theorem is maximizing recipeA's value and it's not clear to me how to get it to favor recipe B or C.

Problem #2

Solving the whole thing out manually like this eliminates the possibility of solving for the water constraints in the subsequent steps (recipes D and E) where ideally the water is constrained through the whole of the problem, not just the first step.

Problem #3

I'm only looking to initially maximize a single output (natural gas). How would I possibly structure this to optimize on two outputs (e.g. at least 2000 heavy crude oil + maximum natural gas)?

I appreciate the continued help!