Closed Neeratyoy closed 2 months ago
@eddiebergman @TarekAbouChakra is the tblogging supported currently? If you can install this branch and run this example and view Tensorboard, it is no longer working.
Just fyi, [PR](#140)
didn't work as intended, you can just do PR #140
:)
As for the PR about python dependancies, just get ifbo in first and we can do the python dep thing after
Regarding tblogger
, I can't initially see what the issue is. Seems that the tblogger
at each iteration will get the trial that is currently being evaluated from the runtime.py
. The runtime.py
will ask the optimizer for the next config to evaluate (which returns the config dict as well as the ID), sets that as the currently running trial and then sends it to run_pipeline()
. Inside run_pipeline()
is where the tblogger.log()
is called, at which point it asks the runtime.py
, "hey, what config is currently being evaluated?". Should be the exact same as what's used for determining the config directory.
Think I found the issue:
Basically it only initializes once, where intialization is in charge of deciding where to write to: https://github.com/automl/neps/blob/0025d3d41db51a6198c7b19464513371eb9cb9ff/neps/plot/tensorboard_eval.py#L109-L135
@eddiebergman how should we go about this PR?
Parallelization works and I happy for now with local testing.
Tblogger is the biggest bummer here, along with test cases failing.
Just highlighting the tests passed at this point, working on rebase on the changes from @timurcarstensen from #140
Fixed issues from tblogger and rebased onto #140. Closing #140 as a result
Rebased onto most recent main branch
Just a note, I went a bit more aggresive and just deleted the DyHPO code altogether, rather than commenting it out. The history for it exists in github and in general, I've never seen the commented code "uncommented" at any point and it just becomes noise.
Added caching so we don't reload the model as often and reduce the number of warnings.
Otherwise, @Neeratyoy I think it ready to go. The only thing I would ask you to check is if the tensorboard issue is resolved. I didn't know how to test it since I've never really used the feature, nor know how to activate it
@eddiebergman I checked the example and it looks fine. With parallel workers too. Also, the Tensorboard feature looks good, as in, it works.
However, there are improvements to the logging generally which I would classify as feature enhancements/fixes to the tblogger and would not block this PR for it.
Looks good to be merged 👍🏽.
FYI @TarekAbouChakra.
Adding
ifBO
and refactoring a bunch of freeze-thaw based multi-fidelity algorithms. Working example here.Existing issues or concerns:
Need resolving some issuesconfig_0_0
has the tbevent files