Closed slinlee closed 4 years ago
@SaharEs If you could share the progress.csv from the training that required 1000 or 2000 iterations, that'd be great. I'll need to see what we need to do to permit 2000 iteration trainings.
@ejunprung here you can find the progress csv file
I have used more hidden layers than the default: model['fcnet_hiddens'] = [256, 256, 128, 128]
Also I think that the new fixes done on the event triggers in will highly affect this training. Might be good to wait until I try it with the new Pathmind libraries.
Thanks, I'll take a look just in case.
For the new libraries, you can also try setting each truck as it's own agent to see if it's less confusing for the policy (10 = 10 trucks). It's not tested so keep an eye out for bugs.
@ejunprung I have done an experiment with the exact same model and reward function in the webapp. I thought you might find it useful to compare with the locally trained one: https://test.devpathmind.com/experiment/2091
Thanks! I'll take a look.
Looking at the progress.csv, it looks converged at around iteration 400 - 500. Did you happen to save a copy of the policy at iteration 400 - 500?
@ejunprung No, I didn't save a policy. But there are some checkpoints, I dont know if they would be useful... I uploaded them to the drive.
Closing this since we've been able to obtain a good policy without longer training. We'll re-evaluate later if it comes up again.
@SaharEs mentioned in the meeting on Monday Aug 10 that some of her trainings need to go 1000(?) or 2000(?) iterations to beat the heuristic. In the webapp the stopping criteria prevents the training from getting close.
Originally we added stopping criteria because trainings were taking 40+ hours. Now that we're back to within an hour or two, we should see if lettings trainings continue longer gets better results.