Open gabryelreyes opened 2 months ago
Suggestion for general architecture improvements:
create separate class handling the environment related functions (reward calculation, handling robot state transitions etc.). The training loop then may look like:
def train_ppo_agent(agent, env, episodes=1000, steps_per_episode=200):
for episode in range(episodes):
state = env.reset()
episode_reward = 0
for step in range(steps_per_episode):
action = agent.get_action(state)
next_state, reward, done, _ = env.step(action)
# Store experience (state, action, reward, next_state, done)
# ...
state = next_state
remove depenency from Agent
to the SerialMuxProt
separate raw data collection and environment reset trigger callback_line_sensors
should not evaluate the environment reset condition. This will be hard issues when the input data is put together from more than just the line sensors. Consider to separate raw sensor data aquisition from processing.
Possible performance improvements:
def prepare_dataset(dataset, batch_size, buffer_size):
dataset = dataset.map(preprocess_data)
dataset = dataset.shuffle(buffer_size)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(1)
return dataset
def create_dataset_from_buffer(buffer): states = tf.convert_to_tensor(buffer.states, dtype=tf.float32) actions = tf.convert_to_tensor(buffer.actions, dtype=tf.float32) rewards = tf.convert_to_tensor(buffer.rewards, dtype=tf.float32) next_states = tf.convert_to_tensor(buffer.next_states, dtype=tf.float32) dones = tf.convert_to_tensor(buffer.dones, dtype=tf.float32) advantages = tf.convert_to_tensor(buffer.advantages, dtype=tf.float32)
dataset = create_dataset(states, actions, rewards, next_states, dones, advantages)
return dataset
- apply the @tf.function decorators to function such as `predict_action` and `learn`
Optimization of the architecture. Special focus on the separation of the agent and the environment