NeurodataWithoutBorders / nwb_benchmarks

Benchmarking for NWB-related operations.
https://nwb-benchmarks.readthedocs.io/en/latest/
Other
4 stars 2 forks source link

Fsspec default caching not cleaning; filesystem full #62

Open CodyCBakerPhD opened 2 months ago

CodyCBakerPhD commented 2 months ago

Eventually, after enough runs of the benchmarks, the fsspec + caching test fills up my temporary directory (2 TB in size) with files

image

and the benchmarks themselves throw errors such as

```python For parameters: 'https://dandiarchive.s3.amazonaws.com/blobs/fec/8a6/fec8a690-2ece-4437-8877-8a002ff8bd8a', 'ElectricalSeriesAp', (slice(0, 30000, None), slice(0, 384, None)) Traceback (most recent call last): File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv\benchmark.py", line 68, in main() File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv\benchmark.py", line 60, in main commands[mode](args) File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\run.py", line 72, in _run result = benchmark.do_run() ^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 661, in do_run return self.run(*self._current_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 165, in run samples, number = self.benchmark_timing( ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 289, in benchmark_timing timing = timer.timeit(number) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\timeit.py", line 180, in timeit timing = self.inner(it, self.timer) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 3, in inner File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 644, in redo_setup self.do_setup() File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 80, in do_setup result = Benchmark.do_setup(self) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 632, in do_setup setup(*self._current_params) File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\benchmarks\time_remote_slicing.py", line 118, in setup self.nwbfile, self.io, self.file, self.bytestream, self.tmpdir = read_hdf5_nwbfile_fsspec_with_cache( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\core\_streaming.py", line 74, in read_hdf5_nwbfile_fsspec_with_cache (file, byte_stream, tmpdir) = read_hdf5_fsspec_with_cache(s3_url=s3_url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\core\_streaming.py", line 55, in read_hdf5_fsspec_with_cache byte_stream = filesystem.open(path=s3_url, mode="rb") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 449, in return lambda *args, **kw: getattr(type(self), item).__get__(self)( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\spec.py", line 1298, in open f = self._open( ^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 449, in return lambda *args, **kw: getattr(type(self), item).__get__(self)( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 365, in _open f.cache = MMapCache(f.blocksize, f._fetch_range, f.size, fn, blocks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\caching.py", line 129, in __init__ self.cache = self._makefile() ^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\caching.py", line 147, in _makefile fd.flush() OSError: [Errno 28] No space left on device For parameters: 'https://dandiarchive.s3.amazonaws.com/blobs/38c/c24/38cc240b-77c5-418a-9040-a7f582ff6541', 'TwoPhotonSeries', (slice(0, 3, None), slice(0, 796, None), slice(0, 512, None)) Traceback (most recent call last): File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv\benchmark.py", line 68, in main() File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv\benchmark.py", line 60, in main commands[mode](args) File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\run.py", line 72, in _run result = benchmark.do_run() ^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 661, in do_run return self.run(*self._current_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 165, in run samples, number = self.benchmark_timing( ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 289, in benchmark_timing timing = timer.timeit(number) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\timeit.py", line 180, in timeit timing = self.inner(it, self.timer) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "", line 3, in inner File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 644, in redo_setup self.do_setup() File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 80, in do_setup result = Benchmark.do_setup(self) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 632, in do_setup setup(*self._current_params) File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\benchmarks\time_remote_slicing.py", line 118, in setup self.nwbfile, self.io, self.file, self.bytestream, self.tmpdir = read_hdf5_nwbfile_fsspec_with_cache( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\core\_streaming.py", line 74, in read_hdf5_nwbfile_fsspec_with_cache (file, byte_stream, tmpdir) = read_hdf5_fsspec_with_cache(s3_url=s3_url) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\core\_streaming.py", line 55, in read_hdf5_fsspec_with_cache byte_stream = filesystem.open(path=s3_url, mode="rb") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 449, in return lambda *args, **kw: getattr(type(self), item).__get__(self)( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\spec.py", line 1298, in open f = self._open( ^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 449, in return lambda *args, **kw: getattr(type(self), item).__get__(self)( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 365, in _open f.cache = MMapCache(f.blocksize, f._fetch_range, f.size, fn, blocks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\caching.py", line 129, in __init__ self.cache = self._makefile() ^^^^^^^^^^^^^^^^ File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\caching.py", line 147, in _makefile fd.flush() OSError: [Errno 28] No space left on device ```

This seems related to the caching mode fsspec uses where it 'reserves' space on disk equivalent to the size of the file and fills in the bytes as requests are received - for small files this is intuitive and fine but for large files it's a pain due to issues like this

A user reported a similar pain with this kind of cache when they tried setting (accidentally) their cache inside an automatically syncing Google Drive folder, which overloaded both their I/O and WiFi speeds, slowing their computer to a crawl (and maxing out their drive storage)

Just something to be aware of in general when assessing the default caching for fsspec, but in the meantime...

I think I expressed doubts about the automatic cleaning functionality of tempfile.TemporaryDirectory.cleanup() before; I highly recommend we follow the pytest strategy of keeping a global folder (also possibly in local/temp but under a reserved name) that we can send repeated shutil.rmtree commands to both at the beginning and end of benchmark runs (therefore giving enough leeway for file locks to have released over time)

CodyCBakerPhD commented 2 months ago

This problem is actually SO bad on my older computer (which has similar architecture to laptops we've seen students use at user days) that I can't even run the benchmarks once without filling up temp space (~250 GB total in User folder; maybe < 100 GB free)

Also lesson to learn here; the location of such a cache really should not be the boot drive - the OS might take most of that and especially on remote servers is usually very slim - I have additional mounted volumes that are meant for bulk space such as fsspec is using here