HumanCompatibleAI / imitation

Clean PyTorch implementations of imitation and reward learning algorithms
https://imitation.readthedocs.io/
MIT License
1.33k stars 249 forks source link

difference between Step and Sample in Dagger #846

Open lwizard1999 opened 7 months ago

lwizard1999 commented 7 months ago

I am using Dagger, but I do not understand why, when I asked for 10,000 steps with 1 environment, I get 6 trajectories and more than 48,000 samples (4 rounds).

If I ask for 10,000 steps, shouldn't I get 10,000 samples, or perhaps a few more?

Furthermore, with only 3 trajectories, we end up with something like 12,036 samples in one round. Why is the algorithm attempting to generate more samples so?

" dagger_trainer = SimpleDAggerTrainer( venv=env, custom_logger=new_logger, scratch_dir= r"C:\Users\user\Temp\dagger", expert_policy=expert, bc_trainer=bc_trainer, rng=np.random.default_rng(),

)

dagger_trainer.train(10000) "