I am using Dagger, but I do not understand why, when I asked for 10,000 steps with 1 environment, I get 6 trajectories and more than 48,000 samples (4 rounds).
If I ask for 10,000 steps, shouldn't I get 10,000 samples, or perhaps a few more?
Furthermore, with only 3 trajectories, we end up with something like 12,036 samples in one round. Why is the algorithm attempting to generate more samples so?
I am using Dagger, but I do not understand why, when I asked for 10,000 steps with 1 environment, I get 6 trajectories and more than 48,000 samples (4 rounds).
If I ask for 10,000 steps, shouldn't I get 10,000 samples, or perhaps a few more?
Furthermore, with only 3 trajectories, we end up with something like 12,036 samples in one round. Why is the algorithm attempting to generate more samples so?
" dagger_trainer = SimpleDAggerTrainer( venv=env, custom_logger=new_logger, scratch_dir= r"C:\Users\user\Temp\dagger", expert_policy=expert, bc_trainer=bc_trainer, rng=np.random.default_rng(),
)
dagger_trainer.train(10000) "