caixunshiren / Highway-Decision-Transformer

Decision Transformer for offline single-agent autonomous highway driving
Apache License 2.0
19 stars 0 forks source link

Normalization in train_dt #3

Open Fornerio opened 2 months ago

Fornerio commented 2 months ago

Hello,

I was checking the train_dt function since my results are not good.

input_type = config.get('input_type', 'coord')
    states = np.concatenate(states, axis=0)
    if input_type == 'grayscale':
        # no normalization needed for cnn
        state_mean, state_std = np.array(50), np.array(100)
    else:
        state_mean, state_std = np.mean(states, axis=0), np.std(states, axis=0) + 1e-6
        state_mean[[0,5,10,15,20]] = 0
        state_std[[0,5,10,15,20]] = 1
    num_timesteps = sum(traj_lens)

what's the meaning of the normalization from 0 to 20 with a step of 5?

Thank you

Peterzhoujr commented 2 months ago

Hello, are you using images as data input? Can i ask you how much data volume are you using?

Fornerio commented 2 months ago

Hi, I am using Kinematic observation as state, but since my results are bad I would like to try using images. Currently, my offline RL dataset consists in 15500 episodes (190 MB). Despite all the effort, compared to a random agent my DT agent reaches a lower mean return in a 50 episodes evaluation. And the eval loss is increasing after 500 epochs (almost 24h of training)

Peterzhoujr commented 2 months ago

Thanks for your reply, I also think the image data works better(i run mlp_decision_transformer-expert-mcts.py), but I still encountered a problem when saving the model there is no save_models folder, so checkpoint can't be saved, how do you deal with the problem of no check_point in the initial training?

Peterzhoujr commented 2 months ago

Hello, may I ask which experimental script you are using?according to your description, you should be using the PPO algorithm as the expert data provider, I use the mcts method to provide expert data, the amount of this data is small.So I wonder how much data the authors use to train the Transformer, as far as I know, the Transformer needs a huge dataset to support, I don't know if there is any relationship between the poor training results and small dataset.

Fornerio commented 2 months ago

Hello, I actually created my dataset with a DQN agent that I trained. Collisions may happen. I am aware of the size of data that transformers need, indeed I am curious about how many data did they gather to reach those results. But I have two questions:

  1. Which machine did they use to train such a large amount of data? It took 22h to train on 15500 episodes, with bad results.
  2. How many days of training were necessary to train?
Fornerio commented 2 months ago

By the way, I have no problem saving the last checkpoint. Have you checked the code in pipeline.train_dt? Are you able to save the intermediate checkpoints?

Peterzhoujr commented 2 months ago

Hi, I think the number of expert training steps as well as the amount of saved data should be increased for better results, also maybe the PPO algorithm will give better results if it is an image input.

Peterzhoujr commented 2 months ago

The code only saves 5 episodes’ data, which is not nearly enough.

` for episode in range(5):
obs, info = env.reset() done = truncated = False total_reward = 0 episode_data = []

    while not (done or truncated):
        action, _ = model.predict(obs)
        obs, reward, done, truncated, info = env.step(action)
        total_reward += reward
        episode_data.append([obs, action, reward])
        env.render()
    env.close()
    all_data.append(episode_data)
    print('Episode: ', episode, ', Crashed?: ',
          info['crashed'], 'Total Reward:', total_reward)

    np.save('ppo_transformer_pc.npy', np.array(all_data, dtype=object), allow_pickle=True)

env.close()`
Peterzhoujr commented 1 month ago

Hi, I would like to ask you a question, how do you determine the number of rounds you use in training, do you use the parameters directly from the source code?

Peterzhoujr commented 1 month ago

Hello,I also experimented using kinematic data but the results were equally unsatisfactory, the first time I achieved a return of 41.3 in only 1000 rounds but the second time the return was only 28.1, also I would like to ask what is the use of 'env_targets': [0.8, 1.0, 1.3],why should return_to_go be set to a constant value when testing?

Fornerio commented 1 month ago

Hello, I have had some improvement using an increased kinematic observation. Now the loss is decreasing instead of increasing, but in inference the ego just go straight. I noticed that it often tries to go to the left even when in the leftmost lane.

I also found this paper that gives some more technical insights about DT parameters: https://openreview.net/forum?id=vpV7fOFQy4

Regarding the train, I have launched a 100k steps training. 5000 steps per iteration. I am still not sure about the env targets, I would like the authors to answer about that!

Peterzhoujr commented 1 month ago

Hello,thank you for your reply.I would like to ask How did you change the 5*5 observation? I've been using grayscale image inputs lately, but have found that the trained agents generally go straight and don't avoid obstacles, and I'm guessing that maybe it's because of your expert data can't avoid obstacles.