RGring / drl_local_planner_ros_stable_baselines

BSD 3-Clause "New" or "Revised" License
133 stars 36 forks source link

Issue loading pre-trained models #9

Closed danieldugas closed 3 years ago

danieldugas commented 3 years ago

Hi,

I've been trying to load the PPO2 models

PPO2.load("example_agents/ppo2_1_raw_data_cont_0/ppo2_1_raw_data_cont_0.pkl")

And get the following error: ValueError: Cannot feed value of shape (1344, 256) for Tensor 'Placeholder_4:0', which has shape '(1600, 256)'

I did some digging, one one hand the weights saved in your pickle file are for a fc1 layer of size 1344, 256. On the other hand, the conv1d operations defined in your custom stable-baselines fork lead to a fc1 layer of size 1600, 256.

In order for the weights to be the correct size (1344, 256), the output of the second conv1d should be of size (?, 21, 64), but instead we obtain (?, 25, 64).

In common_custom_policies.py

def laser_cnn_multi_input(state, **kwargs):
    """
    1D Conv Network

    :param state: (TensorFlow Tensor) state input placeholder
    :param kwargs: (dict) Extra keywords parameters for the convolutional layers of the CNN
    :return: (TensorFlow Tensor) The CNN output layer
    """
    # scan = tf.squeeze(state[:, : , 0:kwargs['laser_scan_len'] , :], axis=1)
    scan = tf.squeeze(state[:, : , 0:kwargs['laser_scan_len'] , :], axis=1)
    wps = tf.squeeze(state[:, :, kwargs['laser_scan_len']:, -1], axis=1)
    # goal = tf.math.multiply(goal, 6)

    kwargs_conv = {}
    activ = tf.nn.relu
    layer_1 = activ(conv1d(scan, 'c1d_1', n_filters=32, filter_size=5, stride=2, init_scale=np.sqrt(2), **kwargs_conv))
    layer_2 = activ(conv1d(layer_1, 'c1d_2', n_filters=64, filter_size=3, stride=2, init_scale=np.sqrt(2), **kwargs_conv))
    layer_2f = conv_to_fc(layer_2)

where conv1d is defined here

I've been making sure to use tensorflow 1.13.1.

Could it be that you used a different version of the conv1d code during training?

RGring commented 3 years ago

Hi. I will have a look... Just wondering, if the same problem also occurs on the discrete agent?

danieldugas commented 3 years ago

I'm seeing the same problem with the discrete agent, yes.

RGring commented 3 years ago

Hi. I can reproduce your error. It happens to me, when I do not launch the simulation (step 4 in "Run pretrained Agents" --> "Run agent trained on raw data, discrete action space, stack size 1"). Can you check, if you have started the simulation? Unfortunately, it is not possible to have all in one launch-file, because the DRL-component runs with python3 and ROS runs with python2.

Though, I don't see yet the connection between the error message and the missing simulation, but it solved it for me :).

Looking forward to your feedback

RGring commented 3 years ago

I also added some examples to run the pre-trained agents in docker in case you want to try that.

danieldugas commented 3 years ago

After looking inside the docker build, I was able to pinpoint the right value for the rosparam, and to get the pretrained models to load with the non-forked stable-baselines, and without ROS.

see here

If you're interested I can make a PR. In any case, thanks for your help!

RGring commented 3 years ago

Hi Daniel, Thanks for your feedback and the improvements. You are very welcome to make a PR!