ARCC-RACE / deepracer-for-dummies

a quick way to get up and running with local deepracer training environment
66 stars 28 forks source link

Resuming training always results error #40

Open sushil-bharati opened 5 years ago

sushil-bharati commented 5 years ago

Either through gui or terminal, I have never had success resuming training. Though gui says that training is running successfully, terminal does not show any progress. Terminal displays (forever)

Found a lock file rl-deepracer-pretrained/model/.lock , waiting
Found a lock file rl-deepracer-pretrained/model/.lock , waiting

Also, gui throws log file load error.

I have configured rl_deepracer_coach_robomaker.py and reward.py correctly.

Michael-Equi commented 5 years ago

Can you pull and see if the log file loading error is gone? You will need to delete the lock file manually since the program will not delete it automatically. Usually, this occurs when you stop training while the neural network is being trained. Just delete the .lock file from the model folder and maybe also the last checkpoint if it did not finish updating.

sushil-bharati commented 5 years ago

Log file error is still there.

Well, then when shouId I stop training if not when neural network is training?

Could not those steps be automated for easy training resume?

Michael-Equi commented 5 years ago

Stop training when the car is completing episodes (driving around on the track). These steps + error correction could definitely be automated in the future although I am a bit crunched on time for now so it could be a while. For now, the easiest way to resume training without saving a new profile is to use the restart button.

sushil-bharati commented 5 years ago

Sure. Lets close this issue when it is fixed. Thanks.