Closed prinshul closed 10 months ago
Hey! Thanks for reaching out.
So the steps you see in wandb above are training iterations.
The ones below are collected frames
every training iteration you collect on_policy_frames_per_batch
so if you multiply the x axis of the plot above by that, you get the plot below.
Otherwise in wandb you can also change the x axis to counters/total_frames
to obtain the plot below
Lemme know if this helps
The fact that it stops at 10 million frames is determined by this param in the config https://github.com/facebookresearch/BenchMARL/blob/02fc1b68f173106934d86824b9ad865ef21a46db/fine_tuned/vmas/conf/config.yaml#L38
Got it. Thank you.
Also, the plot is slightly different from the one given by you for tuned MAPPO + Balance https://api.wandb.ai/links/matteobettini/r5744vas
I got this:
Although the mean reward is almost similar (but not the same though I just ran it without any change). Is this due to randomness during multiple runs? Is this because of seed not set?
Getting the exact same results in 2 different machines with 2 different python environments is basically impossible even if we set the same seed.
The thing you can check (and that should be true) is that if you run the script a second time, with the same seed and same setup on your machine, you get the exact same results.
That said, those 2 curves look really similar so that is good news!
Sure. Thanks.
Hi,
How timesteps are controlled for vmas? I ran balance env with mappo with tuned hyperparameters. In wandb it shows 165 steps. But when I plotted with marl-eval I can see 1e7 timesteps.
Also, the plot is slightly different from the one given for tuned MAPPO + Balance.