Closed johnnyL7 closed 3 years ago
Hm, have you had a chance to test this code btw?
It looks like it only grabs the best checkpoint in the entire experiment (not one for each trial and not the latest), and I don't think you need to restore the agent if the training hasn't been killed yet.
Also, when is export_policy_from_checkpoint
called?
I think it's fine if it uses get_best_checkpoint and only saves the 'best' from all the trials up until this point. We only need to save one and overwrite it if there's a new and better checkpoint as training continues.
Then the webapp can show the 'download policy' button when a policy is available, even while training continues.
Small things:
save_best_policy
?@ejunprung
Also, when is
export_policy_from_checkpoint
called?
I think @johnnyL7 still needs to hook up the callback. Since he marked this as WIP I'm converting it to a draft too
Oh, I see. I'll check again later when Johnny is done.
@johnnyL7 @brettskymind is this meant to go into #372 or #262 ?
@ejunprung I finished plugging it in so it should overwrite the best checkpoint at each checkpoint_iteration (+1 iteration just to make sure all checkpoints are properly saved before the agent is loaded)
Instead of reinstantiating the agent, I access the one available thru callbacks, which should save some time.
I tested the code and it works on the AGV model locally, but runs out of memory even with 1 PBT trial. This shouldnt be a problem in the web app where we can scale the memory, but makes it harder to test.
As for connecting it to the webapp, we can make a experiment_progress_report.txt
that summarizes the best trial where the best checkpoint model up to now can be found.
Let me know what you think, @slinlee
@slinlee I think it's unrelated to reward balancing. This is for grabbing policies at any checkpoint so you don't have to wait until training finishes to test the policy.
@johnnyL7 Nice! I'll try it tomorrow. I don't think there should be memory issues (even locally) so that is weird.
@ejunprung yeah but the pull request is pointing at a really old balancing branch
Oh lol. Dude @johnnyL7 you should create a fresh branch.
@slinlee I am a bit confused now on which is the old one: bg_balance
or bg_reward_blancing
. I thought bg_balance
was the most recent branch Brett and I worked on to merge into test, so I opened this PR to go into bg_balance
Please let me know where I've gone wrong along the way, and sorry for the mixup! lol
@johnnyL7 - usually pull requests go to the default branch. So in nativerl, it will go to dev
From there we can merge or release it to 'test' , staging or prod.
If we base this PR on the rebalancing PR then this change won't be released until the rebalancing one is done, reviewed, and then tested. I don't think these two features are dependant on each other so basing the PR on dev
will let us release this more quickly.
Let me know if that makes sense.
Ah yes that makes sense, I guess my rationale was that bg_balance
has the newest features that are going to be merged into TEST very soon, so I wanted to grab the "latest" from TEST to branch off of.
It makes sense what you said because it's not guaranteed that bg_balnace
is ready yet, so better to branch from dev. I will rebase and push the commit
@johnnyL7 I created #383 which I think covers the changes for saving the best policy during training. If that looks good, we can replace this PR with that one.
cc @ejunprung
Closing b/c we separated into a different PR. https://github.com/SkymindIO/nativerl/pull/383
Callbacks will generate an agent that, restored from the nearest checkpoint, and export to a the expected model format in each of the trials.