dt-rush / sameriver

game engine written in go
Other
1 stars 1 forks source link

Utility evaluator #84

Open dt-rush opened 1 year ago

dt-rush commented 1 year ago

https://mcguirev10.com/2019/01/03/ai-decision-making-with-utility-scores-part-1.html

good article on utility scoring

dt-rush commented 1 year ago

without having read the above, i randomly decided to try to sketch something to chatGPT

me

let's say i had a bunch of "curve functions" defined in my game engine, where a curve is a func (x float64) float64 defined for x in [0,1] and outputting y in [0,1].

I would use these curves to define response curves for a utility based AI. Various perception modules would monitor the world state and provide input values to these curves, producing results for each curve in [0,1].

For example, the hunger response curve could be a sigmoid function centered at 0.5, so a hunger of 0/100 would be 0, 30/100 would be between 0 and 0.5, 50/100 would have a value of 0.5, 60/100 would have a value > 0.5, and depending on the steepness of the sigmoid, probably around 85/100 we'd be approaching 1.0.

This curve describes how important hunger is at a given level.

Then the output of these individual perception scores can be used in an arbitrary mathematical expression to compute a boolean activation of a given behaviour.

Let's say this logic is encapsulated in a BehaviourResponse object. When BehaviourResponse.Respond() is called, they look at game state and calculate - using curves - returning the activation of a behaviour as true or false and the behaviour that is activated

The thing is, once these various scores are calculated, how can we impose a hierarchy on them? for example a "fight/flight" behaviour that is activated should always trump a "get food" behaviour that is activated.

I suppose we could store a priority int (0 is max prio) on the behaviour-response objects

then, we look in a given frame at all the behaviour-responses that are in activated state, and choose the sorted-highest prio one. if there are multiple which have equal prio, then we have to handle some edge cases involving random selection that is consistent between frames.

does this scheme seem to make sense? are there any notes you have that i should be aware of?

ChatGPT (edited to include only useful recommendations)

Yes, your scheme does make sense, and using a priority integer is a common way to impose hierarchy on the AI's behavior. Here are a few notes and suggestions for improvement:

Instead of using an integer priority, consider using a float for more granular control over the priority. This will allow you to have a smoother distribution of priorities between behaviors and provide you with more flexibility when designing the AI.

In cases where multiple behaviors have equal priority, consider implementing a weighted random selection based on the individual curve scores. This can make the AI's behavior appear more dynamic and less deterministic, which may be desirable in a game context.

You might want to implement a cooldown or hysteresis mechanism for certain behaviors to avoid rapid toggling between high-priority behaviors. This can improve the AI's stability and make it appear more natural in its decision-making process.

To make the AI more adaptable, consider allowing the priority values themselves to be dynamic and influenced by the game state or other factors. For example, if the AI's health is low, the fight/flight behavior's priority could increase to reflect the increased importance of survival.

Be mindful of performance implications when calculating and sorting the behavior priorities. Depending on the number of behaviors and the complexity of the calculations, this process could become computationally intensive. Implement caching or other optimization techniques if necessary.

Overall, your proposed scheme provides a solid foundation for a utility-based AI with a hierarchy of behaviors. Be prepared to iterate on this design as you test and refine your AI in the context of your game.

dt-rush commented 1 year ago

as of f5bc235, curves.go contains the curves needed for this AND THEN SOME.

dt-rush commented 1 year ago

me

now that i think about it, the idea of having a curve used to define an activation boolean sounds wrong. the whole value of a curve is its continuity. adding a threshold amounts to throwing away a bunch of useful information.

if we select based not on activation boolean but score float, is there even a way to select by both score and priority?

chatGPT

Yes, there is a way to select behaviors based on both score and priority. You can combine the two criteria into a single metric for selection. One possible approach is to use a weighted sum:

selection_value = score (1 - priority_weight) + priority priority_weight

In this formula, score is the curve score, and priority is the priority value (normalized to the range [0,1]). priority_weight is a tunable parameter between 0 and 1 that controls the balance between score and priority in the selection process.

When priority_weight is 0, the selection is based purely on the score, and when priority_weight is 1, the selection is based purely on priority. By adjusting priority_weight, you can find the desired balance between the two criteria.

This approach maintains the continuity of the curve scores while still allowing you to impose a hierarchy of priorities on the behaviors. However, it's important to note that this method assumes that the priorities are normalized to the same range as the curve scores, so you may need to adjust the priorities accordingly.

Alternatively, you can multiply the score by a priority factor:

selection_value = score * priority_factor(priority)

In this case, priority_factor is a function that maps priority values to a weight factor. The higher the priority, the larger the weight factor should be. This way, higher priority behaviors will have a higher selection value even if their score is lower than that of other behaviors.

Experiment with different formulas and weighting schemes to find the one that works best for your particular game and AI system.

the latter idea seems better and simpler on the right track.

selection_value = score * prio

where prio is already [0,1]

prio 1 means your 1 will trump all 1's of any lesser prio

prio 0.86 means your 1 will trump all 1's of any lesser prio,

what about other scores than 1...

at prio 0.75, your 0.5 will trump what?

0.5*0.75 = 0.375

anything with selection_value < 0.375. So that could mean we beat a prio 1 with score 0.375, or beat a prio 0.8. with prio [exercise to the reader].

dt-rush commented 1 year ago

Decided to go with scaling of the score by a const factor, and desmos proved to be an amazing way to explore and calibrate the model:

https://www.desmos.com/calculator/tw0bj0ws3u 2023-03-30_18-53-30

dt-rush commented 1 year ago

note: to do a work day, start at hour:6 (t0). Mentally select a task (green: plant, harvest, craft) to work toward. drag along the entity's effort line (just use max efffort for exploration) until leasure level surpasses where the chosen task level was at start or until 1 hr after afternoon (workdamp). adjust the mentally-selected task input according to how much work would happen in the amount of time that passed.

dt-rush commented 1 year ago

add religious holidays and daily schedule https://www.desmos.com/calculator/ogdpywer9c

dt-rush commented 1 year ago

now you can move along the zero-work line by dragging on the day timeline slider

dt-rush commented 1 year ago

experimenting with getting latex/tikz to generate the plot given curves:

https://www.overleaf.com/project/64330fa5790118c5c16b3eaa

you know, probably not worth running a latex renderer when you can just do a simple line-graphing algo onto a texture and render it on a quad...