Open Rejuy opened 1 year ago
I enter the function and add some log. Surprisingly, I found that in the module learner.py of tf_agents, the return problem is acutally this:
def run(self, iterations=1, iterator=None, parallel_iterations=10):
""" ...
"""
...
with self.train_summary_writer.as_default(), \
common.soft_device_placement(), \
tf.compat.v2.summary.record_if(_summary_record_if), \
self.strategy.scope():
iterator = iterator or self._experience_iterator
loss_info = self._train(tf.constant(iterations),
iterator,
parallel_iterations)
logging.info("return back to run")
train_step_val = self.train_step.numpy()
for trigger in self.triggers:
trigger(train_step_val)
return loss_info
@common.function(autograph=True)
def _train(self, iterations, iterator, parallel_iterations):
# ...
logging.info("_train start")
loss_info = self.single_train_step(iterator)
for _ in tf.range(iterations - 1):
tf.autograph.experimental.set_loop_options(
parallel_iterations=parallel_iterations)
loss_info = self.single_train_step(iterator)
def _reduce_loss(loss):
# ...
# ...
reduced_loss_info = tf.nest.map_structure(_reduce_loss, loss_info)
logging.info("_train end")
return reduced_loss_info
All log in _train
can be found, indicating _train
is done. However, it never returned to loss_info
.
loss_info = self._train(tf.constant(iterations),
iterator,
parallel_iterations)
logging.info("return back to run")
This means that the log above never get printed. It's very weird. How could this happen?
I come across the same issue. By any chance you got a solution? Thanks!
i got same issue. how could this happen?
I got the same issue. How was it resolved? Pls, Thanks!
@Rejuy May I ask if there has been any progress on this issue, Thanks a lot.
Hi there! I ran into some problems when I'm running the project. I did as the README.md says, and when it was executing this line, it got blocked and never return. How could this happen? I have no idea. Could you give me some advice? Thanks a lot!