Using the agent's RNG, and not numpy's, to select actions

google-deepmind / bsuite

bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent

Apache License 2.0

1.51k stars 182 forks source link

Closed RaghuSpaceRajan closed 4 years ago

RaghuSpaceRajan commented 4 years ago

Hi Ian,

I was trying to run the baseline agents on some of my environments. However, I couldn't get exact reproducibility. I think this is because numpy's own RNG is used for action selection, e.g., here: https://github.com/deepmind/bsuite/blob/f4d12fb029c533ec610902a9565860bf377db556/bsuite/baselines/tf/dqn/agent.py#L78

Is this by design?

Greetings, Raghu.

iosband commented 4 years ago

Good catch - we should change this!

I'll submit a CL