Open Fornerio opened 2 months ago
Hello, are you using images as data input? Can i ask you how much data volume are you using?
Hi, I am using Kinematic observation as state, but since my results are bad I would like to try using images. Currently, my offline RL dataset consists in 15500 episodes (190 MB). Despite all the effort, compared to a random agent my DT agent reaches a lower mean return in a 50 episodes evaluation. And the eval loss is increasing after 500 epochs (almost 24h of training)
Thanks for your reply, I also think the image data works better(i run mlp_decision_transformer-expert-mcts.py), but I still encountered a problem when saving the model there is no save_models folder, so checkpoint can't be saved, how do you deal with the problem of no check_point in the initial training?
Hello, may I ask which experimental script you are using?according to your description, you should be using the PPO algorithm as the expert data provider, I use the mcts method to provide expert data, the amount of this data is small.So I wonder how much data the authors use to train the Transformer, as far as I know, the Transformer needs a huge dataset to support, I don't know if there is any relationship between the poor training results and small dataset.
Hello, I actually created my dataset with a DQN agent that I trained. Collisions may happen. I am aware of the size of data that transformers need, indeed I am curious about how many data did they gather to reach those results. But I have two questions:
By the way, I have no problem saving the last checkpoint. Have you checked the code in pipeline.train_dt? Are you able to save the intermediate checkpoints?
Hi, I think the number of expert training steps as well as the amount of saved data should be increased for better results, also maybe the PPO algorithm will give better results if it is an image input.
The code only saves 5 episodes’ data, which is not nearly enough.
` for episode in range(5):
obs, info = env.reset()
done = truncated = False
total_reward = 0
episode_data = []
while not (done or truncated):
action, _ = model.predict(obs)
obs, reward, done, truncated, info = env.step(action)
total_reward += reward
episode_data.append([obs, action, reward])
env.render()
env.close()
all_data.append(episode_data)
print('Episode: ', episode, ', Crashed?: ',
info['crashed'], 'Total Reward:', total_reward)
np.save('ppo_transformer_pc.npy', np.array(all_data, dtype=object), allow_pickle=True)
env.close()`
Hi, I would like to ask you a question, how do you determine the number of rounds you use in training, do you use the parameters directly from the source code?
Hello,I also experimented using kinematic data but the results were equally unsatisfactory, the first time I achieved a return of 41.3 in only 1000 rounds but the second time the return was only 28.1, also I would like to ask what is the use of 'env_targets': [0.8, 1.0, 1.3],why should return_to_go be set to a constant value when testing?
Hello, I have had some improvement using an increased kinematic observation. Now the loss is decreasing instead of increasing, but in inference the ego just go straight. I noticed that it often tries to go to the left even when in the leftmost lane.
I also found this paper that gives some more technical insights about DT parameters: https://openreview.net/forum?id=vpV7fOFQy4
Regarding the train, I have launched a 100k steps training. 5000 steps per iteration. I am still not sure about the env targets, I would like the authors to answer about that!
Hello,thank you for your reply.I would like to ask How did you change the 5*5 observation? I've been using grayscale image inputs lately, but have found that the trained agents generally go straight and don't avoid obstacles, and I'm guessing that maybe it's because of your expert data can't avoid obstacles.
Hello,
I was checking the train_dt function since my results are not good.
what's the meaning of the normalization from 0 to 20 with a step of 5?
Thank you