eriknw / dask-patternsearch

Scalable pattern search optimization with dask
BSD 3-Clause "New" or "Revised" License
21 stars 2 forks source link

Better memory management of results cache #16

Open eriknw opened 7 years ago

eriknw commented 7 years ago

Currently our results cache is a dict that stores all calculated points. This could use an excessive amount of memory when there are many points, high dimensions, or both. We should accept a cache= keyword that accepts a mutable mapping. It would be convenient to support common use cases and to choose a reasonable default. Options:

  1. LRU cache,
  2. spill to disk cache,
  3. persistent storage.

My preference, 2, is to spill to disk and keep the most recently used results in memory. I think this is probably a sane default too.

mrocklin commented 7 years ago

The mutable mappings in zict may be useful to construct something here. This is what the dask distributed scheduler uses.

On Apr 21, 2017 14:14, "Erik Welch" notifications@github.com wrote:

Currently our results cache is a dict that stores all calculated points. This could use an excessive amount of memory when there are many points, high dimensions, or both. We should accept a cache= keyword that accepts a mutable mapping. It would be convenient to support common use cases and to choose a reasonable default. Options:

  1. LRU cache,
  2. spill to disk cache,
  3. persistent storage.

My preference, 2, is to spill to disk and keep the most recently used results in memory. I think this is probably a sane default too.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eriknw/dask-patternsearch/issues/16, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszHHqzwfeV8JG5PD2_vH64SxGL-Igks5ryPH-gaJpZM4NEnzu .