lab5 poor convergence after adding rocket location

Qervas commented 11 months ago

The former strategy only considered angle and velocity; the Q value looks not bad.

After the added location penalty(close the lowest border), the Q value looks terrible and hard to converge.

Qervas commented 11 months ago

Location Penalty

In QLearningController,.java, tick() function:

String new_state = StateAndReward.getStateHover(angle.getValue(), x.getValue(), y.getValue(), vy.getValue());
double previous_reward = StateAndReward.getRewardHover(previous_angle, previous_x, previous_y, previous_vy);

in StateAndReward.java

  /* State discretization function for the full hover controller */
  public static String getStateHover(double angle, double vx, double vy, double y) {
      int discreteAngle = discretize(angle, 20, -Math.PI, Math.PI);
      int discreteVx = discretize(vx, 5, -1, 1);
      int discreteVy = discretize(vy, 5, -1, 1);
      int discreteY = discretize(y, 10, -2115.89, 1197.83); // Based on the provided minY and maxY

      return "HoverState_" + discreteAngle + "_" + discreteVx + "_" + discreteVy + "_" + discreteY;
  }

  /* Reward function for the full hover controller */
  public static double getRewardHover(double angle, double vx, double vy, double y) {
      double anglePenalty = (Math.abs(angle) > 2.8) ? 1 : Math.abs(angle) / Math.PI;
      double vyPenalty = Math.abs(vy);
      double vxPenalty = Math.abs(vx);

      double proximityToGroundReward = (y < -2000 && y > -2115) ? (1 - Math.abs(y + 2115) / 115) : 0; // Reward is higher when closer to -2115 (but not below it)

      // Heavy penalty if the rocket is upside down and near the ground (indicating a potential crash scenario)
      double flipOverPenalty = (angle > 2.8 || angle < -2.8) && y < -2000 ? -5 : 0;

      double reward = 1 + proximityToGroundReward - anglePenalty - vyPenalty - vxPenalty + flipOverPenalty;

      return reward;
  }

We need to improve the strategy.

Qervas commented 11 months ago

And when the rocket is facing down, is there any method to reward it for turning the downside up?

JoeyJo233 / Free_Repo_CodeSpace1

lab5 poor convergence after adding rocket location #2

Location Penalty