Open Qervas opened 11 months ago
In QLearningController,.java
, tick()
function:
String new_state = StateAndReward.getStateHover(angle.getValue(), x.getValue(), y.getValue(), vy.getValue()); double previous_reward = StateAndReward.getRewardHover(previous_angle, previous_x, previous_y, previous_vy);
in
StateAndReward.java
/* State discretization function for the full hover controller */ public static String getStateHover(double angle, double vx, double vy, double y) { int discreteAngle = discretize(angle, 20, -Math.PI, Math.PI); int discreteVx = discretize(vx, 5, -1, 1); int discreteVy = discretize(vy, 5, -1, 1); int discreteY = discretize(y, 10, -2115.89, 1197.83); // Based on the provided minY and maxY return "HoverState_" + discreteAngle + "_" + discreteVx + "_" + discreteVy + "_" + discreteY; } /* Reward function for the full hover controller */ public static double getRewardHover(double angle, double vx, double vy, double y) { double anglePenalty = (Math.abs(angle) > 2.8) ? 1 : Math.abs(angle) / Math.PI; double vyPenalty = Math.abs(vy); double vxPenalty = Math.abs(vx); double proximityToGroundReward = (y < -2000 && y > -2115) ? (1 - Math.abs(y + 2115) / 115) : 0; // Reward is higher when closer to -2115 (but not below it) // Heavy penalty if the rocket is upside down and near the ground (indicating a potential crash scenario) double flipOverPenalty = (angle > 2.8 || angle < -2.8) && y < -2000 ? -5 : 0; double reward = 1 + proximityToGroundReward - anglePenalty - vyPenalty - vxPenalty + flipOverPenalty; return reward; }
We need to improve the strategy.
And when the rocket is facing down, is there any method to reward it for turning the downside up?
After the added location penalty(close the lowest border), the Q value looks terrible and hard to converge.