FluxML / FluxTraining.jl

A flexible neural net training library inspired by fast.ai
https://fluxml.ai/FluxTraining.jl
MIT License
117 stars 25 forks source link

Implement checkpointer that only saves top-k models. #149

Closed RomeoV closed 1 year ago

RomeoV commented 1 year ago

Fixes #147. When implementing this, there were a couple of considerations:

I have decided that I would like both. Storing top-k models is pretty easy by using a priority queue that tracks the current top_k models and that we can add to. However, storing the latest model makes the logic a bit more complicated, since at any point we may track either k or k+1 models, depending on whether the latest model is in the top_k models. I considered several ways to solve this, and the current implementation seems to be the least error prone. Open to suggestions though.

Tests, and a short docstring are provided. I'm not 100% sure if the docstring like this is correct - feel free to let me know.

RomeoV commented 1 year ago

The error given by Pollen.jl is

fatal: unable to access 'https://github.com/FluxML/FluxTraining.jl/': The requested URL returned error: 403

Not sure what's going on there (?)

ToucheSir commented 1 year ago

Good question, I'm guessing some credential/token has expired but not sure where to check...