Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
275 stars 26 forks source link

some distribution creation doesn' t do anything #166

Closed HiddeLekanne closed 6 months ago

HiddeLekanne commented 7 months ago

Lines like:

predicted_values = Independent(
        Normal(critic(imagined_trajectories), 1, validate_args=validate_args),
        1,
        validate_args=validate_args,
    ).mean

and

    predicted_rewards = Independent(
        Normal(world_model.reward_model(imagined_trajectories), 1, validate_args=validate_args),
        1,
        validate_args=validate_args,
    ).mean

Don't do anything.

It's because you're not sampling from the distribution, your simply taking the mean, which is what you started with anyways. You can confirm it by running a training session with and without the whole distribution creation and see that the model learns exactly the same thing.

Lines are from DreamerV1

HiddeLekanne commented 7 months ago

same should be for the .mode versions in DreamerV2, because for a normal distribution the mode equals the mean.

belerico commented 7 months ago

Hi @HiddeLekanne, yeah you're right: this should give us the same trained agent. I will try it asap