Closed YousufAzadSami closed 4 years ago
If you want to get some quick results, you can run this code with the arguments provided in the README for "data-efficient Rainbow". T-max
is the number of steps (not episodes) in the environment.
If you are just learning about RL, this is not the right codebase/algorithm for you to be working with - it involves the combination of several research papers and assumes familiarity with them. There's plenty of other code out there centred around teaching RL, such as OpenAI's Spinning Up in Deep RL.
Hello good people!
I didn't know where else to post, so I am posting here.
Background : First of all, I am out of my elements here. I am just learning about RL. I got a job on it. It's more code oriented task but I need some concepts as well. I decided to throw myself in the water to break my stagnation. And I am struggling a bit, but that was the idea. I would like to understand the concepts eventually by myself but for the job I need to press on right now. I hope you can help me here.
Issue : When I run it with default arguments it just keep running. I think by default it is set to run 5 million episodes(T-max = 50e6). I want to run one successful run before I start playing with it so I have an idea on what the result is supposed to look like. Should I just change the T-max variable? There are about 20 more arguments and I am not sure if it affects other or not. For example, I think the target-update and learn-start are related to this. And since my concepts are not so clear, I could use some help here.
I hope I was clear, if not please ask me here.