WIPACrepo / iceprod

IceCube dataset management system
MIT License
4 stars 4 forks source link

Automatic Refinement of Memory Allocations #276

Open carlwitt opened 5 years ago

carlwitt commented 5 years ago

In the paper I wrote with @jvansanten [1], I estimated that throughput could be increased by up to 40% in scenarios where memory is the bottleneck. The idea is to measure peak memory usage and replace user estimates after a while with actual resource usage. We looked at the first 5% of the jobs with the same (data set, task_index) and then computed an optimized allocation size that balances over- and under-sizing wastage.

@dsschult, is this something you would consider for integration into the production system? There's a simple implementation of the optimization method in python (needs no special python modules, computationally very quick) [2]. I've been playing with extensions of this method, but found that the basic implementation works quite well already.

If yes, I could allocate one or two days to implement and test this. The most complex part would probably be to orchestrate the whole process of retrieving past measurements, swapping out estimates for optimized allocations, and handling edge cases (and to find my way around the code base)?

[1] @carlwitt, @jvansanten, Ulf Leser: "Learning Low-Wastage Memory Allocations for Scientific Workflows at IceCube", submitted to HPCS 2019 [2] https://github.com/cooperative-computing-lab/efficient-resource-allocations

dsschult commented 5 years ago

@carlwitt, yes, this is something I'd be interested in integrating. I would need to know more about how it works to tell you the right spot, though my guess is as part of iceprod/server/scheduled_tasks.

The only big problem I see is licensing. The license in [2] is GPL, so it cannot be incorporated into IceProd.

jvansanten commented 5 years ago

Just my 2 cents: I don’t know if there’s a huge point in actually using the Tovar code, given the infectious license terms. The algorithm is well described in the paper, and all the complexity is in gathering the input data.

carlwitt commented 5 years ago

@dsschult: Great to hear! I'll get back to you end of March/early April as soon as I'm back from my next business trip. @jvansanten: I agree, writing a new implementation is not a big deal, and I already have some code in place from my experiments.

Cheers, Carl