agronholm / pythonfutures

Backport of the concurrent.futures package to Python 2.6 and 2.7
Other
232 stars 51 forks source link

OPT: add check before submit incase of memory exhausted; #57

Closed NoneGG closed 7 years ago

NoneGG commented 7 years ago

problem

I found memory exhausted when i use concurrent.futures.ProcessPoolExecutor in prod env. The code is similar to code below:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import concurrent.futures
import time

__author__ = 'chenming@bilibili.com'

def worker(array):
    # io takes time to finish
    time.sleep(10)

def main():
    with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
        for i in range(10):
            # a normal size dict, will be copied to subprocess but should be free when the subprocess finish its work
            executor.submit(worker, {k: 1 for k in range(1000000)})
        time.sleep(10)

if __name__ == '__main__':
    main()

reason

I found that if subprocess takes longer time to finish than submit (which is almost determined), the ProcessPoolExecutor._pending_work_items will be full of param submitted, and the dict seems to be too large, then the memory is exhausted. By the way, i doubt if del in Python2 can really release memory to system (seems to be optimized in Python3).

how to avoid

I add a check of _pending_work_items in ProcessPoolExecutor.submit before actually add new workItem to _pending_work_items. If len(self._pending_work_items) > self._max_workers + EXTRA_QUEUED_CALLS, it will block and wait for workItems in _pending_work_items be deleted.

agronholm commented 7 years ago

Does upstream code have this? If not, I suggest you take it up with them first.

dalcinl commented 7 years ago

If this behavior is ever proposed upstream, my bet is that core Python developers will reject it in limine.

@agronholm I would reject and close this PR.

agronholm commented 7 years ago

The whole point of an asynchronous framework like concurrent.futures is to never block.

concurrent.futures is not an asynchronous framework. Future.result() already blocks, by design of course. I do concur with the rest of the comments though.

dalcinl commented 7 years ago

@agronholm Sorry for my bad wording, I meant asynchronous execution framework, and the never block of course does not apply to Future.result().

NoneGG commented 7 years ago

Sure, thank you for your feedback~