Tsinghua-Space-Robot-Learning-Group / SpaceRobotEnv

A gym environment designed for free-floating space robot control based on the MuJoCo platform.
Apache License 2.0
71 stars 9 forks source link

The trained policy is not useful to guide spacerobot how to move #10

Open Knight-xiao opened 4 hours ago

Knight-xiao commented 4 hours ago

Hello, I use the PPO method of your program to train the spacerobot, but I meet a problem now. I use the file(PPO/Discrete/PPO/main.py) to train spacerobot, and the xml file is spacerobotstate, the training data just like train_log.txt in your program, but when i use the trained policy to guide the spacerobot to move, it just kill still and don't move, like the photo in the following: 2024-11-05 20-16-27 的屏幕截图 the eva.py is :

import gym
import torch as T
import numpy as np
from agent import Agent
import SpaceRobotEnv
if __name__ == '__main__':
    env = gym.make("SatelliteEnv-v0")
    n_eval_episodes = 20
    action_space = env.action_space.shape[0]
    obs_shape = env.observation_space['observation'].shape

    agent = Agent(n_actions=action_space,
                  batch_size=16,
                  alpha=0.0003,
                  n_epoch=3,
                  input_dims=obs_shape,
                  model_name_actor="space_robot_actor.pt",
                  model_name_critic="space_robot_critic.pt")

    agent.load_model()  
    score_history = []

    for episode in range(n_eval_episodes):
        obs = env.reset()
        observation = obs["observation"]
        done = False
        score = 0

        while not done:
            env.render()
            action, _, _ = agent.choose_action(observation)
            a = action.reshape(14,)
            a = a.clip(env.action_space.low, env.action_space.high)

            observation_, reward, done, info = env.step(a)
            score += reward
            observation = observation_["observation"]

        score_history.append(score)
        print(f"Episode {episode + 1} Score: {score:.2f}")

    avg_score = np.mean(score_history)
    print(f"\nAverage Score over {n_eval_episodes} episodes: {avg_score:.2f}")
    env.close()

Can you help me to solve this problem? Thank you very much!

Knight-xiao commented 4 hours ago

Besides, i have another question. In the SpaceRobotEnv.py, The variable done is not defined for when it should be True.