Closed nicholasjng closed 6 months ago
Let's discuss object lifetimes here without parameters in the record, assuming no unknown outside references.
model
(large size, say 16GB) into my params
, which I pass into runner.run(). The params are not bound to and thus not referenced by any benchmark, so the model can be reaped as soon as the last benchmark accessing model
is run and the params
go out of scope. This is the desired case. ✅
@nnbench.parametrize(model=model)
), which binds it to that benchmark instance's params
. This is still zero-copy, and the references are destroyed with the benchmarks. But: Since benchmarks are module-level members, the references are not destroyed until either the benchmark function goes out of scope or the module containing the benchmarks is unloaded. ❌gc.collect()
to help). This is the desired case for at-rest parametrizations like @nnbench.parametrize/@nnbench.product
. ✅So it seems either way that the best way to get rid of parameters is to supply models by hand, and if you want to parametrize, doing it with memoization.
Once the global cache + eviction API is in, we'll run a memory profiler on a multi-model benchmark and see what happens if we evict by hand after the completion of a model benchmark family.
TL,DR: Blocked by #125, revisit afterwards.
Addressed by #124 and #120 . We close the ticket.
TL,DR: Retaining references to parameters in benchmark records prevents garbage collection and wastes memory - how can we do better?
103 introduced saving the parameters to the records. This is fine for standard Python types, but wasteful for models and datasets which have a large memory footprint. In the worst (and unfortunately common) case, garbage collection is inhibited since the reference counts of models and data that are not needed anymore never drop to zero.
There are a few ideas here:
I'm leaning towards 2), but if the serialization is too difficult, I prefer dropping the parameters again.