Saver fails to restore agent's checkpoint

Hi, I'm using rl coach through AWS Sagemaker, and I'm running in an issue that I struggle to understand.

I'm performing RL using AWS Sagemaker for the learning, and AWS Robomaker for the environment, like in DeepRacer which uses rl coach as well. In fact, the code only little differs with the DeepRacer code on the learning side. But the environment is completely different though.

What happens:

The graph manager initialization succeeds
A first checkpoint is generated (and uploaded to S3)
The agent loads the first checkpoint
The agent performs N episodes with the first policy
The graph manager fetches the N episodes
The graph manager performs 1 training step and create a second checkpoint (uploaded to S3)
The agent fails to restore the model with the second checkpoint.

The agent raises an exception with the message : Failed to restore agent's checkpoint: 'main_level/agent/main/online/global_step'

The traceback points to a bug happening in this rl coach module:

File "/someverylongpath/rl_coach/architectures/tensorflow_components/savers.py", line 93, in <dictcomp>
    for ph, v in zip(self._variable_placeholders, self._variables)
KeyError: 'main_level/agent/main/online/global_step'

Which I think testifies that in the function from_arrays, variables and self._variables do not contain the same variables...

So, there's a few things I don't understand about this problem, and I'm not used to rl coach so I think your point of view would be valuable.

How can variables and self._variables be different ?
Why does it fails only the second time ? (Does the graph manager change the computational graph ?)

A few more info:

I use rl-coach-slim 1.0.0 and tensorflow 1.11.0
Note that I use just like Deepracer a patch on rl coach.
I also attach the checkpoints in case it helps checkpoints_and_metadata.zip

IntelLabs / coach

Saver fails to restore agent's checkpoint #465