google-deepmind / deepmind-research

This repository contains implementations and illustrative code to accompany DeepMind publications
Apache License 2.0
13.19k stars 2.6k forks source link

question about reading dataset #349

Open samas69420 opened 2 years ago

samas69420 commented 2 years ago

hi, i would like to read the waterdrop dataset in "learning to simulate" repo and print individual tensors for each particle in a given timestep but i'm pretty new to tensorflow and reading a tfrecord is a little trickier than how i expected, how can i do it?

kevroi commented 2 years ago

Hi! After downloading the dataset, have you tried reading the TFRecord, and then printing out its contents after decoding it? (I found this example in the TF docs)

basoomen commented 1 year ago

I'm using the following code to read the TFRecord file. However, I do not get a reasonable list of floats that represents the data I expect (positions of particles at all timesteps of the ground truth simulation).

#Inspect the data in the WaterDrop reference dataset
file_path = '/tmp/datasets/WaterDrop/test.tfrecord'

ds_waterdrop = tf.data.TFRecordDataset(file_path)

feature_description = {
    'key': tf.io.FixedLenFeature([], tf.int64),
    'particle_type': tf.io.FixedLenFeature([], tf.string)
}

def _parse_function(example_proto):
    return tf.io.parse_single_example(example_proto, feature_description)

ds_waterdrop = ds_waterdrop.map(_parse_function)

#Create empty lists to store values
keys = []
particle_types = []
waterdrop_array = np.zeros((2, ))
i = 0
for record in ds_waterdrop:
    feature_value_1 = record['key'].numpy()
    feature_value_2_bytes = record['particle_type'].numpy()
    feature_value_2_iter = struct.iter_unpack('<f',  feature_value_2_bytes)

    feature_value_2_list = list(feature_value_2_iter)

I am thus also still wondering how to decode this file properly. I am also wondering why there are 30 instances of the features 'key' and 'particle_type' when from the render_rollout I conclude that the test dataset uses a set of 285 particles over 1000 timesteps.