google-deepmind / open_x_embodiment

Apache License 2.0
623 stars 41 forks source link

Pretrained RT-1-X does not seem to perform well on fractal data. #35

Open sebbyjp opened 5 months ago

sebbyjp commented 5 months ago

I followed the padding procedure in Minimal_example_for_running_inference_using_RT_1_X_TF_using_tensorflow_datasets.ipynb and am using the same sentence encoder "https://tfhub.dev/google/universal-sentence-encoder-large/5".

However after summing up the world vectors and rotation deltas for the expert and pretrained model from gs://gdm-robotics-open-x-embodiment/open_x_embodiment_and_rt_x_oss/rt_1_x_tf_trained_for_0022724, it is clear that this pre-trained model is overshooting the workspace by up to two meters sometimes. The "rt1main" weights from Google Research also produces similar results (top row is the ground truth from the fractal dataset):

Screenshot 2024-01-06 at 11 39 48 PM

I believe I am using tf_agents as in the colab demo above. What am I doing wrong? I am doing something like:

policy: LoadedPolicy = SavedModelPyTFEagerPolicy(
        model_path=checkpoint_path,
        load_specs_from_pbtxt=load_specs_from_pbtxt,
        use_tf_function=use_tf_function,
        batch_time_steps=batch_time_steps,
    )

 observation = specs.zero_spec_nest(
        specs.from_spec(policy.time_step_spec.observation), outer_dims=(batch_size,)
    )

    observation["image"] = format_images(imgs)
    observation['natural_language_embedding'] = embed_text(
        instructions, batch_size)

    if step == 0:
        time_step = ts.restart(observation, batch_size)
    elif terminate:
        time_step = ts.termination(observation, reward)
    else:
        time_step = ts.transition(observation, reward)

    action, next_state, info = policy.action(time_step, policy_state)

for each inference call with the returned policy state. (You can see the exact code I am running which is this method)

Am I missing something?