Closed V0XNIHILI closed 7 months ago
@Maxtimer97 and I definitely ran into this issue before and I believe that we solved it somehow. I think the issue is that hooks are being attached to the network but they are never cleared.
Are you running the dev version of neurobench @V0XNIHILI? Or which version of the package?
Ok yeah, after looking into it a little I did fix this problem in this commit which is in main: de4ad1930fc705efc4628ad1ac1c57837dbb467d
But I haven't yet updated the pip package to include this fix. For now, I guess the way around this error is to use poetry run python
to make sure to use a local version of the harness rather than the package.
We're figuring out automated deployment, after which these types of issues shouldn't come up
Just the version downloaded via pip indeed!
I updated the pypi package to 1.0.3 which includes this bugfix, so should be all good to run through the library instead of poetry at this point!
When calling
test(...)
two times in a row (in this case during MSWC FSCIL pre-training): https://github.com/NeuroBench/neurobench/blob/50d855cbd5b361699f48038443a7a5397f5ca855/neurobench/examples/mswc_fscil/mswc_fscil.py#L145 https://github.com/NeuroBench/neurobench/blob/50d855cbd5b361699f48038443a7a5397f5ca855/neurobench/examples/mswc_fscil/mswc_fscil.py#L146Memory usage increases very significantly during the execution of the second
test(...)
: (secondtest(...)
starting at ~7s mark)And causes the GPU to run out of memory: (at ~1 second)
Which causes the script to fail. This is the
test(...)
function:It seems (but still not 1000% confirmed) that that either the
Benchmark
initialization or the.run()
call causes a memory leak when being used for the second time. I suspect this because when I do a manual loop through the dataloader to compute the accuracy, this sudden increase in memory usage doesn't occur.Note: the same behavior is present when running this code on a CPU only.