fsdonks / m4

Other
0 stars 1 forks source link

Objective Cyclelength #24

Closed fs-tom closed 5 years ago

fs-tom commented 5 years ago

We ran into a situation where we need to transition from a finite policy, to another policy that provides a custom cyclelength for use in policy transfers, but may end up in an absorbing state (e.g. the endstate has an infinite transition time to the start state).

The immediate patch for this is to allow users to define a cyclelength to use when computing transfers, via {:cyclelength ...} in the deltas map supplied for a policy. This is flexible and generally works.

Testing shows there are complications with this method for cases when projecting the unit's current policy (the pseudo-finite one) onto another policy. Legacy mechanisms for computing a normalized proportion only detect infinite policies based on the cyclelength, which we now preclude. This means it's possible for a pseudo-finite policy to have a proportion (ct/clength) of > 1.0, which should toss an error. We need to determine what to do in this case... The simplest arithmetic fix is to clamp the proportion to the range [0.0 1.0]. The implication here is that - if we define a policy that is "allowed" to exceed its objective cycle length indefinitely - then that unit is effectively permanently at the end of its cycle....e.g. 1.0 (or therein). So, to provide a compatibly means of changing policies, we ensure the unit is clamped according to its objective cycle length....

Note: this could also affect normalized dwell computations, since we had to determine a finite-cycle-length goal to compare infinite policies with finite more accurately (e.g. giving infinite policies the ability to better compete and not be penalized due to a massive denominator).

So, we need to examine the implications (and patches) for applying an objective cycle length to policies that may never transition. Another alternative....could be to remove the transition from end-state to start-state, or imply that end-state leads to a waiting state (which wouldn't age the unit)....

fs-tom commented 5 years ago

This actually maps very closely with the concept from e5467fe83acc7caf47f95beb18072d0a07b9c98f and bbe385e52078ae9f196d35e2135f4988e2244aa3 .

We actually have a very simple mechanism for this that doesn't require much munging. The two concepts of concern are: 1) ensuring we handle policy changes appropriately (e.g. via normalized cycle times, etc.), and 2) ensuring our normalized dwell computations (e.g. via the finite cycle length goal when dealing with infinite policies) are similarly captured.

Currently, we assume a default finite cycle length, from marathon.ces.entityfactory/+default-cycle-length+, which is set at 1095 for initial conditions purposes. What we're doing here is saying "this policy wants to define its own FCLG" and allowing the user to convey that via the deltas map in the policyrecord. If we use this new cyclelength (really the FCLG) as our actual cyclelength, such as just jamming it into the policy's cyclelength key instead of using the computed cyclelength (which will be infinite), we screw up 1 (introducing the potential for cycle times exceeding the policy's cyclelength) necessitating corner case logic, and 2 (less so though, since we allow proportions > 1 already, which is desired for sorting purposes).

It looks like we can just wire in some logic in marathon.ces.unit/cycle-stats, which funnels the information for +default-cycle-length+ into the cycle record, which is used for computing proportions and the like, and we should be golden. That is, we can let the default machinery for cycle transfers leverage the infinite cyclelength that emerges from the policy naturally (e.g. there's a state transition somewhere that's >= 9999999), while using the user-defined FCLG for normalized dwell purposes to enable fair scoring. So, this ends up being a fairly minor additive modification....