RasmusBrostroem / ConnectFourRL

0 stars 0 forks source link

Look closer at value estimates #102

Open jbirkesteen opened 1 year ago

jbirkesteen commented 1 year ago

When we printed value estimates yesterday (for networks which were finished training), all the values were very close to 1. Printing them during training, they jumped quite dramatically already after 1 episode, which seems a bit extreme. The agent probably shouldn't be so sure so early on. A first check would be to see how varying the step size self.alpha affects this.

Other ideas @RasmusBrostroem?

jbirkesteen commented 1 year ago

This will be easier to test once #108 has been merged.

RasmusBrostroem commented 1 year ago

I mean varying lambda and gamma is also an option, but alpha seems like a good start. Another point, is that all the valuations where quite similar, so it would be interesting to see if the exploration change helps with this or if we need to consider other options.

jbirkesteen commented 1 year ago

Good point, hadn't thought about gamma and lambda in this context. And yes, we should definitely look in to the similarities between valuations, too!