Custom simplified environment for agent training

pengmun commented 11 months ago

Here are some codes to get started on creating a custom environment using poliastro

from astropy import units as u
from astropy.time import Time
from poliastro.twobody import Orbit
from poliastro.bodies import Sun, Moon, Earth

r_evader = [859.07256, -4137.2037, 5295.5687] << u.km
v_evader = [7.372892, 2.0822357, 0.43999979] << u.km / u.s
epoch_time = Time('2013-03-18 12:30:00.000')
evader = Orbit.from_vectors(Earth, r_evader, v_evader, epoch=epoch_time)
evader_30m = evader.propagate(30 << u.min)

r_pursuer = [869.07256, -4137.2037, 5295.5687] << u.km
v_pursuer = [7.372892, 2.0822357, 0.43999979] << u.km / u.s
epoch_time = Time('2013-03-18 12:30:00.000')
pursuer = Orbit.from_vectors(Earth, r_pursuer, v_pursuer, epoch=epoch_time)

v_pursuer += [1, 0, 0] << u.km / u.s
pursuer = Orbit.from_vectors(Earth, r_pursuer, v_pursuer, epoch=epoch_time)
pursuer_30m = pursuer.propagate(30 << u.min)

print(evader_30m.r)
print(pursuer_30m.r)

vrodriguezf commented 11 months ago

Thank you Peng! Can anyone wrap it up as a KSPDG Agent and evaluate it, to get a baseline?

pengmun commented 11 months ago

I was thinking of using this code to create a custom environment for training. This simplified environment will be able to be run headless and we can run more training runs.

vrodriguezf commented 11 months ago

What would be the benefit of that with respect to use the code directly as the agent? I understand that this agent can work for whatever initial conditions are fed, right?

pengmun commented 11 months ago

It is not an action policy so we won't be able to use the code directly as the agent.

The code here only propagate and simulate the position of the pursuer and evader under different action. Instead of having to interface the agent to KSP, we can interface the agent to this custom environment for faster scenario evaluation.

vrodriguezf commented 11 months ago

where is the action in that code? undersstood as thorttle + time

pengmun commented 11 months ago

The action is defined by this line: v_pursuer += [1, 0, 0] << u.km / u.s

Here, we are adding delta-V directly to the current speed of the pursuer, assuming instantaneous thrust (i.e delta-V = thrust/mass). Akin to setting time to be 1s and using throttle (1 to -1) to control the thrust. Throttle to thrust is given by throttle*PARAMS.PURSUER.RCS.N_THRUSTERS_FORWARD*PARAMS.PURSUER.RCS.VACUUM_MAX_THRUST_PER_NOZZLE.

ARCLab-MIT / kspdg

Custom simplified environment for agent training #7