BCDA-APS / bdp_controls

APS-U Beam line Data Pipelines - experiment controls with EPICS and Bluesky
Other
0 stars 1 forks source link

characterize latencies in workflow 3 demonstrated at M4 #46

Closed prjemian closed 2 years ago

prjemian commented 2 years ago

At the M4 demonstration of workflow 3 (fully autonomous peak centering), the demo was slow. Describe the sources of latency and what can be done to reduce each. Demonstrate if possible or practical.

prjemian commented 2 years ago

Coarse discussion concluded that latencies might be reduced by factors of 2-5 but BDP is interested in workflows that execute on time scales faster (than workflow3) by orders of magnitude. This means that a workflow based on one acquisition yielding one image file and one analysis will not scale to much faster. We need to switch to a streaming workflow model that avoids file-writing bottlenecks (one of the biggest latencies).

prjemian commented 2 years ago

@sveseli Can you identify the delays added in the DM workflow for the M4 demonstration? Can we control (reduce) any of them?

sveseli commented 2 years ago

DM Workflow

All of those items below are related to the file-based processing:

  1. Workflow start. DM DAQ service has a configuration parameter that determines how long to wait to process the file (start the workflow) after it was last written into. This parameter typically differs from beamline to beamline. For BDP, at the moment this is set to 5 seconds and could be reduced (probably to as low as 1 second).
  2. Delays in workflow tasks. Although DM workflow engine runs tasks as soon as possible, there may be artificial delays introduced in task scripts and workflow steps. For example-03 workflow, step 03-MONITOR-JOB, we poll SGE every 10 seconds. This could be reduced to something like 1 second. Obviously, in a production environment with hundreds or thousands of files processed simultaneously one must balance speed of workflow execution vs overloading the batch system needlessly.
  3. Batch system delays. How busy the system is, are there sufficient resources to run immediately, how long does it takes to schedule and run a job, etc. For BDP, this is not significant. For production environment, it can be a problem.
  4. Analysis scripts themselves.
prjemian commented 2 years ago

Bluesky data acquisition

Factors related to 03-workflow (workflow was presented at M4 meeting)​