NeuroBench / neurobench

Benchmark harness and baseline results for the NeuroBench algorithm track.
https://neurobench.readthedocs.io
Apache License 2.0
46 stars 11 forks source link

Memory leak in `Benchmark`? class #197

Closed V0XNIHILI closed 3 months ago

V0XNIHILI commented 3 months ago

When calling test(...) two times in a row (in this case during MSWC FSCIL pre-training): https://github.com/NeuroBench/neurobench/blob/50d855cbd5b361699f48038443a7a5397f5ca855/neurobench/examples/mswc_fscil/mswc_fscil.py#L145 https://github.com/NeuroBench/neurobench/blob/50d855cbd5b361699f48038443a7a5397f5ca855/neurobench/examples/mswc_fscil/mswc_fscil.py#L146

Memory usage increases very significantly during the execution of the second test(...): image (second test(...) starting at ~7s mark)

And causes the GPU to run out of memory: image (at ~1 second)

Which causes the script to fail. This is the test(...) function:

image

It seems (but still not 1000% confirmed) that that either the Benchmark initialization or the .run() call causes a memory leak when being used for the second time. I suspect this because when I do a manual loop through the dataloader to compute the accuracy, this sudden increase in memory usage doesn't occur.

Note: the same behavior is present when running this code on a CPU only.

jasonlyik commented 3 months ago

@Maxtimer97 and I definitely ran into this issue before and I believe that we solved it somehow. I think the issue is that hooks are being attached to the network but they are never cleared.

Are you running the dev version of neurobench @V0XNIHILI? Or which version of the package?

jasonlyik commented 3 months ago

Ok yeah, after looking into it a little I did fix this problem in this commit which is in main: de4ad1930fc705efc4628ad1ac1c57837dbb467d

But I haven't yet updated the pip package to include this fix. For now, I guess the way around this error is to use poetry run python to make sure to use a local version of the harness rather than the package.

We're figuring out automated deployment, after which these types of issues shouldn't come up

V0XNIHILI commented 3 months ago

Just the version downloaded via pip indeed!

jasonlyik commented 2 months ago

I updated the pypi package to 1.0.3 which includes this bugfix, so should be all good to run through the library instead of poetry at this point!