PathmindAI / nativerl

Train reinforcement learning agents using AnyLogic or Python-based simulations
Apache License 2.0
19 stars 4 forks source link

Reevaluate stopping criteria #126

Closed slinlee closed 4 years ago

slinlee commented 4 years ago

@SaharEs mentioned in the meeting on Monday Aug 10 that some of her trainings need to go 1000(?) or 2000(?) iterations to beat the heuristic. In the webapp the stopping criteria prevents the training from getting close.

Originally we added stopping criteria because trainings were taking 40+ hours. Now that we're back to within an hour or two, we should see if lettings trainings continue longer gets better results.

ejunprung commented 4 years ago

@SaharEs If you could share the progress.csv from the training that required 1000 or 2000 iterations, that'd be great. I'll need to see what we need to do to permit 2000 iteration trainings.

SaharEs commented 4 years ago

@ejunprung here you can find the progress csv file

I have used more hidden layers than the default: model['fcnet_hiddens'] = [256, 256, 128, 128]

Also I think that the new fixes done on the event triggers in will highly affect this training. Might be good to wait until I try it with the new Pathmind libraries.

ejunprung commented 4 years ago

Thanks, I'll take a look just in case.

For the new libraries, you can also try setting each truck as it's own agent to see if it's less confusing for the policy (10 = 10 trucks). It's not tested so keep an eye out for bugs.

image

SaharEs commented 4 years ago

@ejunprung I have done an experiment with the exact same model and reward function in the webapp. I thought you might find it useful to compare with the locally trained one: https://test.devpathmind.com/experiment/2091

ejunprung commented 4 years ago

Thanks! I'll take a look.

Looking at the progress.csv, it looks converged at around iteration 400 - 500. Did you happen to save a copy of the policy at iteration 400 - 500?

SaharEs commented 4 years ago

@ejunprung No, I didn't save a policy. But there are some checkpoints, I dont know if they would be useful... I uploaded them to the drive.

ejunprung commented 4 years ago

Closing this since we've been able to obtain a good policy without longer training. We'll re-evaluate later if it comes up again.