Open piba941 opened 3 years ago
same purpose, what your current progress?
Hi there, we are happy to help if we can.
Could you provide some more information on what your issues are?
Note that you can use your own datasets without writing them as "tf.records". Instead, tf.data.Dataset
has a from_generator method, that lets you load a dataset using arbitrary python code, so you may replace these two lines here to load your dataset in whatever format you have it stored with from_generator
and simply make the tensors look like those read from our records datasets :)
First, thanks to @alvarosg for the answer.
I have made very simple arbitrary data which resembles the structure of the parsed version of the test.tfrecord
as follows:
particle_type = np.array([3, 3, 3, 3])
key = 0
position = np.array([[[0.42, 0.55],
[0.41, 0.55],
[0.40, 0.54],
[0.39, 0.53]],
[[0.56, 0.14],
[0.56, 0.13],
[0.54, 0.12],
[0.53, 0.11]]])
What is a good way to make this a compatible dataset? I first tried to make a generator
and put it into a from_generator
, but I am not sure how to make this dataset compatible with the training data.
The TFRecord uses the SequenceExample
format, which can be generated as shown below:
# Import modules and this file should be outside learning_to_simulate code folder
import functools
import os
import json
import pickle
import tensorflow.compat.v1 as tf
import numpy as np
from learning_to_simulate import reading_utils
# Set datapath and validation set
data_path = './datasets/WaterDropSample'
filename = 'valid.tfrecord'
# Read metadata
def _read_metadata(data_path):
with open(os.path.join(data_path, 'metadata.json'), 'rt') as fp:
return json.loads(fp.read())
# Fetch metadata
metadata = _read_metadata(data_path)
print(metadata)
# Read TFRecord
ds_org = tf.data.TFRecordDataset([os.path.join(data_path, filename)])
ds = ds_org.map(functools.partial(reading_utils.parse_serialized_simulation_example, metadata=metadata))
# Convert to list
# @tf.function
def list_tf(ds):
return(list(ds))
lds = list_tf(ds)
particle_types = []
keys = []
positions = []
for _ds in ds:
context, features = _ds
particle_types.append(context["particle_type"].numpy().astype(np.int64))
keys.append(context["key"].numpy().astype(np.int64))
positions.append(features["position"].numpy().astype(np.float32))
# The following functions can be used to convert a value to a type compatible
# with tf.train.Example.
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
if isinstance(value, type(tf.constant(0))):
value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _float_feature(value):
"""Returns a float_list from a float / double."""
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def _int64_feature(value):
"""Returns an int64_list from a bool / enum / int / uint."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
# Write TF Record
with tf.python_io.TFRecordWriter('test.tfrecord') as writer:
for step, (particle_type, key, position) in enumerate(zip(particle_types, keys, positions)):
seq = tf.train.SequenceExample(
context=tf.train.Features(feature={
"particle_type": _bytes_feature(particle_type.tobytes()),
"key": _int64_feature(key)
}),
feature_lists=tf.train.FeatureLists(feature_list={
'position': tf.train.FeatureList(
feature=[_bytes_feature(position.flatten().tobytes())],
),
'step_context': tf.train.FeatureList(
feature=[_bytes_feature(np.float32(step).tobytes())]
),
})
)
writer.write(seq.SerializeToString())
dt = tf.data.TFRecordDataset(['test.tfrecord'])
dt = dt.map(functools.partial(reading_utils.parse_serialized_simulation_example, metadata=metadata))
# Check if the original TFRecord and the newly generated TFRecord are the same
for ((_ds_context, _ds_feature), (_dt_context, _dt_feature)) in zip(ds, dt):
if not np.allclose(_ds_context["key"].numpy(), _dt_context["key"].numpy()):
break
if not np.allclose(_ds_context["particle_type"].numpy(), _dt_context["particle_type"].numpy()):
break
if not np.allclose(_ds_feature["position"].numpy(), _dt_feature["position"].numpy()):
break
else:
print("TFRecords are similar!")
Note that you can use your own datasets without writing them as "tf.records". Instead,
tf.data.Dataset
has a from_generator method, that lets you load a dataset using arbitrary python code, so you may replace these two lines here to load your dataset in whatever format you have it stored withfrom_generator
and simply make the tensors look like those read from our records datasets :)
Thanks. Can you provide an explanation about the format of your datasets?
Thanks. Can you provide an explanation about the format of your datasets?
This varies a bit on a per dataset basis, but in the most simple case (dataset with only positions), it is mostly two dicts with fields:
context["particle_type"] # shape [num_particles]
features["position"] # shape [num_steps, num particles, num_dimensions]
If you want to connect your own data to this, I would recommend to run these two lines:
ds = tf.data.TFRecordDataset([os.path.join(data_path, f'{split}.tfrecord')])
ds = ds.map(functools.partial(
reading_utils.parse_serialized_simulation_example, metadata=metadata))
inspect the spec of the output, and then write your own tf.Dataset
that outputs your data in the same format.
Thanks. Can you provide an explanation about the format of your datasets?
This varies a bit on a per dataset basis, but in the most simple case (dataset with only positions), it is mostly two dicts with fields:
context["particle_type"] # shape [num_particles] features["position"] # shape [num_steps, num particles, num_dimensions]
If you want to connect your own data to this, I would recommend to run these two lines:
ds = tf.data.TFRecordDataset([os.path.join(data_path, f'{split}.tfrecord')]) ds = ds.map(functools.partial( reading_utils.parse_serialized_simulation_example, metadata=metadata))
inspect the spec of the output, and then write your own
tf.Dataset
that outputs your data in the same format.
Many thanks to Alvaro. That's what I want.
The GNS article mentioned generating the Water-3D dataset using SPlisHSPlasH. How did you do it? I want to make some test cases myself. thanks a lot!!!
what is step_context args for the data? is it forced to be add when creating the dataset
when I debug for open tfrecord, I see in each element the dict 1 has only position information, the step_context is not included
I am trying to evaluate the model using my own dataset but i am facing issues while generating and converting into TFrecords. Could you please help.