OrderedSet86 / gtnh-flow

Factory Optimization Flowcharts for Gregtech: New Horizons
MIT License
84 stars 24 forks source link

Add Linear Programming Solver #47

Closed Hermanoid closed 3 months ago

Hermanoid commented 5 months ago

Here's a very ambitious addition!

TL;DR: this PR adds a matrix-like solver to make problems with loops way easier to solve, and I have some questions about implementing the front-end.

This PR aims to add a Linear Programming solver alongside the existing Sympy solver, comparable to how the Factorio Factory Planner offers both a traditional and matrix solver. Just like Factorio, it's not practical to have a database of the value of all items in the game (beyond the base game, anyway), which makes solving these problems a challenge. However, I don't love the way Factory Planner handles this lack of information (by having users choose Free Variables), a solver that only needs a minimum of information on what's an output and what's an input. I also wanted something that can choose the most optimal of multiple processing routes - hence, Linear Programming solver.

As this is a big hunk of stuff to support, the first question is whether you (@OrderedSet86) are willing to incorporate and maintain this solver in your repo. If not, I can totally support it in my fork. If so, there is still work to be done with this code. But, since the backend solver is pretty much nailed down and reasonably well-tested (see my dev log on it), I wanted to open this up and ask for input on a few points on the front-end.

First, a high-level overview of my backend implementation: This solver is largely inspired by Kirk McDonald's Factorio Calculator write-up, but I had to make a number of modifications to handle the increased complexity of GTNH problems:

  1. Kirk's write-up uses powers of the ratio between the largest and smallest values in the recipe matrix to prioritize items in the objective/cost function. However, GTNH tends to have both small and large numbers flying around (from amortized byproducts in the 0.2 range to fluid amounts in the 10k+ range), resulting in very large ratios that cause numeric precision issues. To handle this, a quadratic solver simultaneously scales both items (e.g. 1000L oil might become 1kL) and recipes (e.g. 5->1 becomes 100->20) to minimize this "priority ratio".
  2. We don't know which items should be inputs to a problem. Instead of making the user enter every one, the solver finds a minimal set of Extra Inputs to add. It does so by adding terms to the objective function at a higher cost than explicit inputs, such that user-specified inputs are used before additional items are added. In practice, the solver does well even when the user specifies only one or two "main" ingredients (like ilmenite ore in the titanium line or pmp in the platline) .
  3. Kirk's recipe tax to minimize byproduct production doesn't apply well to GTNH problems, where it's often cheaper (in terms of # of recipes and cost of inputs) to bypass reprocessing loops we would much rather be used. I replace this with explicit byproduct variables that are either minimized with highest cost (in most cases) or maximized with lowest priority (when a byproduct is "desired" as the output of recipes that are only included to deal with a useful byproduct, like processing radioactive sludge dust).

I chose CVXPY as the LP solver of choice for this problem because it can also handle the quadratic scaling problem, all in one shiny dependency.

Are are the design questions I'd like to discuss:

  1. How to handle new required parameters in project file.
    1. New parameters include:
      1. Byproducts (when the simple "is this only ever an output" heuristic fails, is a byproduct desired or minimized)
      2. Explicit Inputs (with priority level for "expensiveness")
    2. Theories include:
      1. Adding even more dictionary-type tags to each recipe, like target does
      2. Modifying project format (either with a pseudo-machine on the same list or a higher-level dictionary with solver parameters and recipe sections)
    3. Corollary: what are better/shorter terms for "input expensiveness" and "byproduct handling type"?
    4. Corollary: how important is it that we maintain compatibility with projects for traditional/sympy solver? I like the idea of being able to switch between the two easily, but how often would the user actually do that?
    5. Corollary: how should user select between solvers? It could be a command line argument or some aspect of the project yaml file, like presence of a solver parameters pseudo-machine or the schema of the whole file.
  2. How to the handle bracketed tags in the project file that differentiate "flows" of the same item.
    1. Currently implemented: always ignore them. Allows easier compatibility with sympy solver, but there are some scenarios where you still want to separate ingredients with the same name to show perfect loops:
      1. naqline
      2. P-507 is consumed by three machines in the Naquadria and Naquadah Stage One sections, and three other machines (in the same sections) produce exactly the amount needed by each consumer. You might want separate visual lines to show that a loopback pipe could easily handle each of these perfect-loops, rather than having all inputs and outputs go through the same node.
    2. Theories include:
      1. Add a different tag with curly brackets or something that is treated as "stronger" than square bracket (I don't like this as it's complicated for the user)
      2. Ignore these scenarios because they're rare (probably best option)
      3. If it's a common enough issue, automatically detect perfect loops and separate lines without user input.
  3. Is it worth the effort of supporting number-of-machines tags in addition to target items.
    1. A solution that allows for "number:" tags like the sympy solver will (in my head) involve a sizeable modification of the LP problem. The relevant recipe_var would be set/constrained and the targets (which the user must still specify) would be maximized rather than a constraint.
    2. In short: it's hard to do, and would require support of a distinct solver "mode"

Let me know your thoughts!

OrderedSet86 commented 5 months ago

I would appreciate if you had consulted me first, since I am working on a new solver already here: https://github.com/OrderedSet86/flow2 I think we duplicated a lot of work. (You can message me at ".order" on Discord.) But I think we ultimately agree on three major points, which is really good:

  1. Item value database is inpractical for a lot of reasons
  2. The goal should be minimizing user input as much as possible. My broader vision is to have it "just work" from selecting recipes in a NEI-like web interface (https://github.com/OrderedSet86/webnei-solid) even for users who are not that familiar with the quirks of the factory solver.
  3. The most obvious/practical user input minimizing solution is to only specify external input/output. From my testing on flowv2 it reduces amount of user interventions by 8x on platline.

I had some issues with trying to include loops in a LP problem solver earlier, which I detail here: https://github.com/OrderedSet86/flow2/blob/main/math.md My current solution that I have been exploring is an explicit equation solver using sympy again, mostly because it gives a more clear closed form solution using rationals. But a linear programming solution should be roughly equivalent.

For (1), I just added a new entry like in the image. Then while processing the yaml file I watched for v2_node_type or m to determine type. This is probably the best for backwards compatibility. I think there will be additional tooling required for these inputs because although there are fewer inputs, it is harder to come up with them a priori and the tool should guide you towards suspected required inputs. I have some ideas for this, like binary searching over subgraphs. Yes, we should continue using the project YAML file. image

For (2) I think it can ignored for now, but a more fundamental rework of readability is needed soon. The current edges overlap way too much and unreadable stuff can happen often. In particular something that I think hurts readability a lot is same-rank horizontal dependencies, which you can see an example of in the above chart between Naq Stage 1 and Naq Stage 2. Rank can be manually specified when generating a graphviz chart, so I can write an algorithm that makes layout better for this (roughly speaking, distance from the source node). I am also partway through adding the quantities directly to the nodes instead of floating by the edges: light_fuel

For (3), I am not seeing how this is difficult, but perhaps you are constructing your LP problem in a way that makes this difficult. I will look at the code now.

Although I am still maintaining flow v1, I would ultimately like to redo a large portion of the codebase. I think networkx is a significantly better choice for graph problems as it makes searching for ancestors/children and implementing common graph algorithms much easier than the ad hoc solution I am doing currently. You can see this if you look at the v2 code, it is significantly cleaner and easier to maintain.

OrderedSet86 commented 5 months ago

Ah I see, your solver uses priority ratios: https://github.com/OrderedSet86/gtnh-flow/pull/47/files#diff-7ec0af41c2feb068c65f4ee2a9e4775a5e3e145d08656288b1d1cd13efabd5bbR106-R136

I don't think this will work for all problems due to the issues I describe here: https://github.com/OrderedSet86/flow2/blob/main/math.md#linear-programming-and-selecting-an-objective-function Furthermore, the failure modes are not that obvious and it will randomly break for the end user, which is a situation we want to avoid.

Hermanoid commented 5 months ago

Ah, phooey. I looked all over at solvers for other games, but didn't think to check your other repo.

Fortunately, I do think this solution has a thing or two to add to the conversation. Firstly, from reading that math write-up:

  1. You mention the "sensitivity" issue with priority ratios in the objective function. That is, the constant you multiply by to prioritize one thing over another must be very large to handle situations where item quantities blow up (like bio lab stuff). The solution I tracked down for this was matrix scaling, where item and recipe quantities are scaled to minimize these big differences in number scales. While I haven't tested it with bio problems specifically, I found that it was able to resolve large-number situations (like liquid air distilling, etc.) such that this priority ratio only needs to be <10 - well within a reasonable range, and always predictable. After the solver completes, the amount-of-each-recipe variables can be unscaled to recover the solution. Check that out here. As an unexpected side effect, the minimal solution this scaler finds tends to combat what you called the "2<1000" problem, where items on a lower scale, which thus have smaller amounts and lower costs, can tend to be selected. I found that items were often scaled to the same level, so the "2" item would be scaled by 1/2 and the "1000" item by 1/1000. Note that this is still prone to some issues, mostly with decimal precision (see the very bottom of my very-hard-to-read-devlog).
  2. It's always fun when this happens - I arrived at the same idea for finding a minimal set of inputs to a given problem. You talk about it in beyond linear programming. I experimented with the quadratic approach you mention (where input_amount = binary_switch * continuous_amount) using CVXPY, but it didn't play well with its DCP ruleset (I struggle to understand why), and the other rulesets have other gotchas like strictly positive variables that make them not applicable. Perhaps someone smarter than I could figure it out or find a different solver. What I did figure out (while on top of a mountain in Italy, absent-minded thinking about this problem, talking to my sister and making fun of myself for struggling with this problem... it was a hilarious moment) is a sorta-hacky way to linearize the problem. Instead of input_amount = binary_switch * continuous_amount, I use input_amount = (binary_switch - continuous_subtractor) * BIG_NUMBER, where the big number is practically the priority ratio (because that is the largest amount of any ingredient we expect to use in this problem). See it in action here. I don't love it and there are situations where it might break down (if BIG_NUMBER isn't big enough). Check out this comment for why this might fail, and why I think it probably won't happen in not-contrived problems. Further, in my tests where I intentionally made BIG_NUMBER too small, I usually saw either a failure-to-solve or a "subtractor" of zero, meaning the solution used a maximal amount of the input. Both of these cases are detectable, which means we can report them to the user. I think this makes this solution "plenty good enough" for most cases.

I'm certainly open to more ideas on this, especially better options than the "minimal set of inputs" approach. For example, running the solver on the platline with PMP as an input and platinum dust as an output, the solver would find that using zero PMP and shortcutting to a reprecipitated platinum dust input. Of course, this approach also displayed some unexpected "intelligence" by finding that it required fewer inputs to reprocess the byproduct calcium chloride and use the extra chlorine (along with some extra inputs) to produce some nice byproducts. It's not a bad approach, but it still requires the user to specify stuff we would like to be automatic, like simply acids and other inputs.

I do like the "maximize flow as a low priority" approach you mention in your writeup, and I think it might be a way to implement the "use as many of the provided recipes as possible" heuristic I was considering. I don't love that it will conflict with using the solver to choose the most optimal of multiple possible processing paths. But, honestly, I think it's very rare that you would want to use this solver for that. Perhaps I'm wrong,

You say "From my testing on flowv2 it reduces amount of user interventions by 8x on platline" - does this mean you have a minimal-set-of-inputs prototype out there?

Regardless, if you think this implementation holds some merit, I'm definitely up to contact you on discord and talk about integrating this with flow2 instead. I'm glad I stopped here to ask for input, before getting too deep into integration with this flow1.

Hermanoid commented 5 months ago

Two other odd points:

  1. I've also been dreaming about hooking the solver up to a website with exported recipes. One odd thought I had: retrofit essentially your WebNEI into https://www.satisfactorytools.com/. It offers some very nice features, like saving projects for later and displaying everything on an interactive, draggable map. It would be a hunk of work, but the main details would be A) swapping out the recipe list (which is currently downloaded in its entirety) for SQL database calls and B) swapping out the existing solver for whatever suped-up version we come up with (either ported to javascript or enabled via API calls, neither of which I love).
  2. It's not a drop-in replacement, but I think another instance of me-not-finding-something-that-already-exists happened with the recipe exporter your WebNEI uses. I wrote this one after researching like crazy (because I so hate java) and never finding the one you used. I'm honestly blown away I didn't find it in literally hours of searching. How did you happen across this one?
OrderedSet86 commented 5 months ago

Yes - please reach out on Discord. I am actively developing it and we would benefit from coordinating on development.

Yep, flow v2 has a MVP working. Here is platline. Unfortunately loops make the readability problem worse (look at the mess halfway down), and deciding "minimal set of I/O" is counterintuitive for large machine lines. Needs more time to cook before it is ready to replace v1 (the code will ultimately live here once it is done since this repo is much more popular / widely-known). But it works. image

For broader design philosophy of the tool - I am not intending to pick the objectively optimal route between two methods. There are other tools that can solve that. Instead I leave the burden of machine line and resource selection up to the user, and instead show something that is (A) a useful and readable overview of a machine line, which can be used to build a line in the actual game, (B) an accurate predictor of overclock and machine count values, and (C) as a consequence of A and B, something useful for developer pack balancing. This is a complex enough problem and I am happy to leave eg. HOG fraction solving as an out of scope problem. I only mention this up front because you mentioned it here:

I don't love that it will conflict with using the solver to choose the most optimal of multiple possible processing paths. But, honestly, I think it's very rare that you would want to use this solver for that. Perhaps I'm wrong,

1 & 2) I am skeptical of any solution that requires numerical constraints on recipe I/O values. There is no guarantee that a machine line will not exist in the future that breaks this. For example, naqline has large input quantities and outputs tiny 4mb/s quantities. I think a "good enough" objective function can be found, but I am worried about it randomly breaking in a completely opaque way to the end user, and us not having a better answer than "tough luck." I would like to use a solver with no numerical sensitivities / heuristics if possible, and my v2 sympy solver does that. I appreciate the work you have done (it is impressive you made something this big in such a short time in an unfamiliar repo) but you are at the same spot I was when initially developing it - I looked into the pulp solver (same library even!) and ultimately discarded it due to too many annoying heuristics and issues needed to patch it.

I am theoretically open to basing the web UI on another tool, but I personally have some dislike of most JS frameworks (particularly React). It seems Satisfactory Tools is using Angular, which I do like more than React... We can likely steal whichever libraries they are using and integrate them into the final implementation. Really it comes down to whether it is worth it for dev time saving, as I would be happy with writing the web interface myself (assuming we choose a non-terrible framework). The existing one uses SolidJS which I like quite a lot, but I think is not in a state where it is ready for community projects - the UI library already got deprecated since I last worked on it. API calls are fine to me, the DB will be colocated on the webserver running WebNEI.

For the recipe exporter, nesql-exporter is the "official" one written by one of the dev team, so it is probably best if we coalescence around that one. I believe there is a fork owned by the GTNH developers (https://github.com/GTNewHorizons/nesql-exporter) so we can get permissions to manage commits to that one. The primary author is nice to work with but often busy, so we will have to do the legwork on getting it going. My understanding is the two main needs are more NEI handlers covered and faster recipe exporting.