Closed djstrong closed 4 months ago
Since merging your change I'm still getting this issue. I tried a few other things and haven't been able to figure out how to get it to release memory after a run. I assume this must be related to a transformers or pytorch update because it wasn't an issue before.
Reopening for now.
I pushed a change that I think should fix this. The memory releasing part must have gotten mixed up in the refactor so it was only deleting the model per benchmark run, not per iteration (but reloading the model every iteration).
Should be fixed now, at least it is from my testing. Thanks for the report & contribution!
I was running with one iteration only.
I was running with one iteration only.
You're right, multiple runs and multiple iterations were not releasing memory. Both should be fixed now since it's releasing memory after every iteration.
With this config:
I have got warning for the second model:
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
and evaluation is very slow.I am running it on GPU with 40GB.
If only Nous-Hermes-2-SOLAR-10.7B is in the config then everything is fine. I guess the previous model is not removed before loading the next one - I see the
del model
incleanup
, but it does nothing actually.