I am currently trying to run your code and get the same performance, but the mean reward is stuck around a score of 5. I have tried to run it three times and I got the same performance each time. The code seems to run fine though.
How random is the performance ? How many trials did you do before obtaining the results presented in the README ?
I am currently trying to run your code and get the same performance, but the mean reward is stuck around a score of 5. I have tried to run it three times and I got the same performance each time. The code seems to run fine though.
How random is the performance ? How many trials did you do before obtaining the results presented in the README ?