Open mathDR opened 1 month ago
Hey @zcharles8 in rereading the paper I subsequently discovered the authors have a full jax
implemenentation here.
I will email them and see if they have any qualms about me using aspects of their code as part of this PR.
Do you know if there would be licensing problems with this approach?
Please advise.
Do you know if there would be licensing problems with this approach?
Great question, probably there would from reading Apple's license... Let me send a mail to one of the authors I know.
Thanks @vroulet I also tried sending an email to all 3 authors (but had to google their contact info so those might be outdated)
Basically I just asked if they are keen to get a version into contrib
I would continue my PR but use their docstrings (with attribution)
But if they don't want a version in optax
and want users to use their version, I would close the PR.
Emailed with Matteo Pagliardini and he wrote:
Very happy to hear that you found the work useful and thanks for the PR. Having AdEMAMix as part of Optax would be great. Feel free to proceed with the PR and use our docstrings.
So I went ahead and completed the PR.
One open question: AdEMAMix uses a pretty bespoke scheduler for b3
. The alpha
scheduler can be implemented via the vanilla linear_schedule
but I couldn't find a drop in replacement for the b3
scheduler.
Both are now used in the rosenbrock
example, but I didn't know if we wanted to add a new scheduler type for completeness?
One open question: AdEMAMix uses a pretty bespoke scheduler for
b3
. Thealpha
scheduler can be implemented via the vanillalinear_schedule
but I couldn't find a drop in replacement for theb3
scheduler.Both are now used in the
rosenbrock
example, but I didn't know if we wanted to add a new scheduler type for completeness?
I think that adding the schedulers to the library (alpha_scheduler
and b3_scheduler
) is totally reasonable, if for no other reason than to make it easy for someone to directly use the schedulers from the paper.
FWIW I think that this PR is really good (ie. things like typing information) and would prefer to have it pushed through (nice work @mathDR !)
Okay thanks for the tip to render the docs. That allowed me to fix a lot of weirdness (and a LaTeX error!).
I think this is good to go!
So @vroulet I think you have to merge it? Or does another authorized user have to do it?
The PR needs to be approved by one of the internal owners of the package (as I did), then copybara automatically syncs the PR with the internal code, produces a snapshot for another maintainer to check and once that other maintainer gives his/her approval the PR is merged. (That's why the PRs often take a bit of time to be merged even after they get approved).
thanks for the contribution! Note that you'll need to edit gallery.rst
for your example to display in https://optax.readthedocs.io/en/latest/gallery.html
Okay great. I updated gallery.rst
and added a png thumbnail to the images/
directory. I also added a colab
link, as I saw the other examples did the same.
Note, when I render the docs locally, my example renders a Keyboard Interrupt
that doesn't exist in the original jupyter notebook.
Is this an artifact of sphinx attempting to render the notebook? Hopefully this is a problem with my local env and will not persist in main
that's because your test is taking too long to run. you can either make it quicker, or add it to nb_execution_excludepatterns in https://github.com/google-deepmind/optax/blob/main/docs/conf.py (but then it won't be run as part of the test suite)
This PR adds the AdaMAMix optimizer from The arxiv preprint: THE ADEMAMIX OPTIMIZER: BETTER, FASTER, OLDER
Closes #1058
The docs have been updated, along with the relevant files.
Currently the docstrings are implemented, but further descriptions should/could be added. I will reach out to the paper authors to assist with that (if they are willing).