Inconsistent Alignment: Discrepancy Between Language Description and Observations

I ran the visualization examples from the code lab with the dataset utaustin_mutex. However, the GIF image I got is different from the language description. For example, I use the following code to extract the observations and the corresponding language instruction from the first episode of utaustin_mutex:

import tensorflow_datasets as tfds
from PIL import Image
from IPython import display
from tqdm import tqdm

def dataset2path(dataset_name):
    if dataset_name == 'robo_net':
        version = '1.0.0'
    elif dataset_name == 'language_table':
        version = '0.0.1'
    else:
        version = '0.1.0'
    # return f'gs://gresearch/robotics/{dataset_name}/{version}'
    return f'~/tensorflow_datasets/{dataset_name}/{version}'

def as_gif(images, path='temp.gif'):
    # Render the images as the gif:
    # images[0].save(path, save_all=True, append_images=images[1:], duration=1000, loop=0)
    images[0].save(path, save_all=True, append_images=images[1:], duration=100, loop=0)
    gif_bytes = open(path, 'rb').read()
    return gif_bytes

_full_dataset = ['utaustin_mutex']

display_key = 'image'

for dataset in tqdm(sorted(_full_dataset), desc="processing dataset"):
    dataset_name = dataset
    b = tfds.builder_from_directory(builder_dir=dataset2path(dataset))
    ds = b.as_dataset(split='train[:1]').shuffle(1)   # take only first 10 episodes
    episode = next(iter(ds))
    images = [step['observation'][display_key] for step in episode['steps']]
    images = [Image.fromarray(image.numpy()) for image in images]
    display.Image(as_gif(images))

    step = next(iter(episode['steps']))
    language_instruction = step['language_instruction']
    language_instruction = language_instruction.numpy().decode("utf-8")
    print(language_instruction)

Then, I got the language description as

Kindly spot and seek the red cup placed ahead of you.
Cautiously adjust your gripper towards the red cup, gripping it gently.
Find the after-storage area of the caddy.
Relocate the red cup in your grip over the rear portion and softly release it into the compartment.

However, in the GIF image, there is only blue cup instead of red one. Is there something wrong with this dataset? Or this problem is related to the visualization code? temp

google-deepmind / open_x_embodiment

Inconsistent Alignment: Discrepancy Between Language Description and Observations #33