aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
436 stars 189 forks source link

Submission of processes slows down when number of daemon workers is increased #3540

Open sphuber opened 5 years ago

sphuber commented 5 years ago

The time taken to submit a process seems to increase when the number of active daemon workers increases. Below a quick profiling of the aiida.engine.launch.submit method, courtesy of @giovannipizzi

import aiida
import time
aiida.load_profile()
from aiida.engine import submit
from aiida.plugins import CalculationFactory
from aiida.orm import Int, load_code

Add = CalculationFactory('arithmetic.add')
builder = Add.get_builder()
builder.metadata.options.resources = {'num_machines': 1}
builder.x = Int(1)
builder.y = Int(2)
builder.code = load_code('add@localhost_direct')

%load_ext line_profiler

def submit_many(builder, n=100):
    t = time.time()
    for i in range(n):
        submit(builder)
        new_t = time.time()
        print(i, new_t - t)
        t = new_t

%lprun -f submit submit_many(builder)

The results show that the culprits are mostly instantiate_process and save_checkpoint

Timer unit: 1e-06 s

Total time: 71.7569 s
File: /Users/pizzi/git/aiida-core/aiida/engine/launch.py
Function: submit at line 85

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    85                                           def submit(process, **inputs):
    86                                               """Submit the process with the supplied inputs to the daemon immediately returning control to the interpreter.
    87                                           
    88                                               .. warning: this should not be used within another process. Instead, there one should use the `submit` method of
    89                                                   the wrapping process itself, i.e. use `self.submit`.
    90                                           
    91                                               .. warning: submission of processes requires `store_provenance=True`
    92                                           
    93                                               :param process: the process class to submit
    94                                               :type process: :class:`aiida.engine.Process`
    95                                           
    96                                               :param inputs: the inputs to be passed to the process
    97                                               :type inputs: dict
    98                                           
    99                                               :return: the calculation node of the process
   100                                               :rtype: :class:`aiida.orm.ProcessNode`
   101                                               """
   102       100       1241.0     12.4      0.0      assert not is_process_function(process), 'Cannot submit a process function'
   103                                           
   104                                               # Submitting from within another process requires `self.submit` unless it is a work function, in which case the
   105                                               # current process in the scope should be an instance of `FunctionProcess`
   106       100       2891.0     28.9      0.0      if is_process_scoped() and not isinstance(Process.current(), FunctionProcess):
   107                                                   raise InvalidOperation('Cannot use top-level `submit` from within another process, use `self.submit` instead')
   108                                           
   109       100        656.0      6.6      0.0      runner = manager.get_manager().get_runner()
   110       100        642.0      6.4      0.0      controller = manager.get_manager().get_process_controller()
   111                                           
   112       100   57928707.0 579287.1     80.7      process = instantiate_process(runner, process, **inputs)
   113                                           
   114                                               # If a dry run is requested, simply forward to `run`, because it is not compatible with `submit`. We choose for this
   115                                               # instead of raising, because in this way the user does not have to change the launcher when testing.
   116       100       2192.0     21.9      0.0      if process.metadata.get('dry_run', False):
   117                                                   _, node = run_get_node(process)
   118                                                   return node
   119                                           
   120       100        945.0      9.4      0.0      if not process.metadata.store_provenance:
   121                                                   raise InvalidOperation('cannot submit a process with `store_provenance=False`')
   122                                           
   123       100   11340894.0 113408.9     15.8      runner.persister.save_checkpoint(process)
   124       100     607970.0   6079.7      0.8      process.close()
   125                                           
   126                                               # Do not wait for the future's result, because in the case of a single worker this would cock-block itself
   127       100    1869713.0  18697.1      2.6      controller.continue_process(process.pid, nowait=False, no_reply=True)
   128                                           
   129       100       1057.0     10.6      0.0      return process.node
unkcpz commented 2 weeks ago

The results show that the culprits are mostly instantiate_process and save_checkpoint

These are bottleneck from test above but seems they are slow because they are DB operations. The only thing related to number of daemon seems controller.continue_process. A close benchmark needed to see if the performance is influenced by the number of process in the list.