Open ablaom opened 5 years ago
These sound like good the ideas. I think there was also the idea of mapping out resource demands as more and more data rows are made available to the learning algorithm. I think MLR does something like this and calls it "learning curves".
In this regard there may be some overlap with API design for dealing with online/active learning, which I think going to be some work. Some discussion around this is at https://github.com/alan-turing-institute/MLJ.jl/issues/139#issuecomment-495926733, with an open issue at #60. Sorry, nonsense, since each time you grow the data you retrain from scratch.
I'll have a use for estimating resource needs in the future (for use as an objective variable in some multi-objective optimization problem, and also for preventing my Julia workers from strangling themselves). What ideas do you have in mind for making such estimates?
I could imagine moving averages over the results of
@elapsed
and@allocated
would work well if you're already running a machine multiple times. However, it would be beneficial to also have a rough estimate available before the first run, especially if we consider using huge, automatically-generated Flux models which may be too large to run on any available worker (my usecase).