Closed pirobot closed 3 years ago
Hi @pirobot,
as you correctly identified and as the error message states, there are no scalars found. As your inspection output shows, there are no scalars in your even file, there are only tensors. The tensorboard-aggregator only works for scalars and not for tensors.
I hope this helps.
Hi @Spenhouet Many thanks for the quick response! Forgive me for asking a newbie question, but we appear to be writing our log data as scalars so I'm not sure why they are showing up as tensors. Here is the code we use to write the data. Can you see anything obvious we are doing incorrectly?
if global_step_val % eval_interval == 0:
metric_utils.compute_summaries(
eval_metrics,
eval_py_env,
eval_py_policy,
num_episodes=num_eval_episodes,
global_step=0,
callback=eval_metrics_callback,
tf_summaries=True,
log=True,
)
with eval_summary_writer.as_default(), tf.compat.v2.summary.record_if(True):
with tf.name_scope('Metrics/'):
episodes = eval_py_env.get_stored_episodes()
episodes = [episode for sublist in episodes for episode in sublist][:num_eval_episodes]
metrics = episode_utils.get_metrics(episodes)
for key in sorted(metrics.keys()):
print(key, ':', metrics[key])
metric_op = tf.compat.v2.summary.scalar(name=key,
data=metrics[key],
step=global_step_val)
sess.run(metric_op)
sess.run(eval_summary_flush_op)
where we define eval_summary_writer as follows:
eval_summary_writer = tf.compat.v2.summary.create_file_writer(
eval_dir, flush_millis=summaries_flush_secs * 1000)
eval_metrics = [
batched_py_metric.BatchedPyMetric(
py_metrics.AverageReturnMetric,
metric_args={'buffer_size': num_eval_episodes},
batch_size=num_parallel_environments_eval),
batched_py_metric.BatchedPyMetric(
py_metrics.AverageEpisodeLengthMetric,
metric_args={'buffer_size': num_eval_episodes},
batch_size=num_parallel_environments_eval),
]
eval_summary_flush_op = eval_summary_writer.flush()
When I did use TensorFlow (switched to pytorch) I did save scalars with tf.summary.scalar(name, data, step=None)
as documented here: https://www.tensorflow.org/api_docs/python/tf/summary/scalar
You are using tf.compat.v2.summary.scalar
. I'm not sure about the differences.
The migration guide seems to contain some suggestions: https://www.tensorflow.org/tensorboard/migrate
Maybe just try tf.summary.scalar
or tf.compat.v1.summary.scalar
and see if this works?
EDIT:
I'm also not familiar with the way you create a file writer. Not sure what the eval_metrics
does.
Maybe try a simple file writer like:
result_dir = Path('./res')
train_writer = tf.summary.FileWriter(result_dir / 'train')
eval_writer = tf.summary.FileWriter(result_dir / 'eval')
EDIT2: I'm not up-to-date with the changes with respect to TensorFlow 2. Please adjust the above examples if necessary.
OK thanks for the suggestions! I'll try these and see how it goes.
events.out.tfevents.1594678042.pi-dell.10209.12.v2.zip
Describe the bug When running aggregator.py against our event files, we get the error:
To Reproduce Run aggregator.py against the attached event file.
Expected behavior Expected summary files to be generated from scalars.
Screenshots None.
Desktop (please complete the following information):
Additional context This is the output we get when we run tensorboard --inspect on the same event file: