Closed drallensmith closed 6 years ago
Thanks for the many suggested changes! Due to the magnitude of the changes, I feel like this PR should be broken into at least 3 separate PRs:
1) Editing the documentation 2) Reorient action 3) Action Feedback
I'm fine with 1). However, I think 2) and 3) need some thoughtful discussion:
2) It's not clear to me what functions doPreprocess() actually performs that aren't already executed in Agent::actionImpl(). Is there enough actual substance to doPreprocess to deserve its own action?
3) I sympathize with the general lack of feedback given as to whether high-level actions have succeeded or not. Due to simplicity, I'd favor just returning the boolean that is returned by the action (with the semantics of True = maybe succeeded, False = failed). What is the case for returning an action_status_t instead of just a pure bool?
I understand about breaking it up; it is together mainly because I needed the third part and, to a lesser degree, the second part in order to work out the action chart (and some other documentation fixes). I will work on splitting it up, although I am not quite sure how to do so cleanly with git given that the commits are intermingled.
I'm not quite sure what you're meaning with regard to actionImpl; while it does set the view and neck action (previously after any action, which resulted in overriding any changes made by the actions; I have corrected this), it does not have anything specific like calling Bhv_Emergency or Bhv_NeckBodyToBall, which result in such actions as scanning the field for the ball.
Actually, that's what I was originally doing, with the ACTION_STATUS_UNKNOWN return (requiring a non-boolean return for only the "communicating" functions) only if the user asked about an action that was not the last one stored (plus, what does one otherwise return if the last action was NOOP?). However, I discovered in the process of creating the action chart that sometimes actions would return a boolean false but actually being useful, such as by getting out of a collision situation. I therefore expanded the use of the ACTION_STATUS_UNKNOWN return.
Quite welcome,
-Allen
I have separated out the documentation updates (and a slight .travis.yml improvement) into a separate pull request, although this one still includes everything.
I'm trying to understand the situations in which Reorient would be a good action to call. By default actionImpl() calls Neck_TurnToBallOrScan which calls either Neck_ScanField or Neck_TurnToBall.
Now, let's consider what situations Reorient is different:
1) If an agent is frozen or has lost it's own position/velocity, then reorient will call Bhv_Emergency which calls Bhv_ScanField. 2) If the ball is not found, it calls Bhv_NeckBodyToBall, which calls Bhv_ScanField.
So the basic difference is Neck_ScanField vs. Bhv_ScanField. Where actionImpl() calls the former by default and Reorient uses the latter.
It seems these two methods are indeed doing different things. Unfortunately, I don't have enough background in librcsc to tell which should be preferred (e.g. should actionImpl() be calling Bhv_ScanField?). Do you have any insight into this?
I am not by any means an expert on librcsc... but from my reading of the code:
Neck_ScanField::calcAngleDefault
, attempting to produce a set of view cones that will cover the playing field, with the next one chosen by what is currently most uncertain. (It may also be using Neck_ScanPlayers to keep track of each player, but this is unclear due to a check for allPlayers.size() being the normal 22 - I do not know how this will interact with HFO. I also note that Neck_ScanField::calcAngleDefault
contracts the area it scans for keepaway mode, but there is no check for an HFO mode, so it may well be scanning outside the HFO playable area - a TODO for the future would be creating an actual HFO mode and updating the code accordingly.) In other words, it looks like Bhv_ScanField is more for dealing with uncertain situations, involves doTurn actions that may conflict with other actions, and usually uses a wide viewing angle (which is also set by View_Tactical if the ball location is unknown). Neck_ScanField is better for bringing in information a bit at a time, albeit possibly depending on already-known information, and does not change the viewing angle set by View_Tactical (which depends primarily on the distance to the ball).
This PR should now contain only changes for adding feedback, possibly plus some documentation updates.
I spotted a bug in doPreprocess (and doReorient) - it was doing a search for the ball if there were more than a threshold number of position observations, not less. (Incidentally, I note that Bhv_NeckBodyToPoint could actually have a version that would take a relative position only, in which case Bhv_NeckBodyToBall would require only ball().rposValid()...)
Documentation updates
First, I have put in some general fixes including clarifications.
Second, as per #31 including a chart of applicable actions. As mentioned in #36 earlier, one thought behind this is that even the high-level random player effectively gets some of this information; if comparing to the random player, a learning agent will ideally also have such information available (such as via an extra set of inputs after processing). It should also be helpful in figuring out if an action should even be possible for a given side/player.
Reorient action
Currently, Dribble - but no other actions, as far as I can tell - calls
doPreprocess
, which does a number of things that are not only applicable for dribbling. Most notably, it triggers various means of self-location and ball-location that (without view/neck control) are not otherwise available. I have extracted the applicable portions ofdoPreprocess
(removing ones only useful if the player has the ball) and put them into a new action, Reorient. (Another reason for creating this is to give high-level offensive players without the ball an alternative to constantly issuing Move commands.)Feedback enabled
A number of actions return (or now return) some information as to whether there is any possibility they were successful at accomplishing the action and/or helpful in some other way (such as getting out of a collision). I have added a mechanism for gathering this information (about the most recently processed action), to be (optionally) used for helping learning agents with more-frequent feedback than "game lost" or "game won".