globus-gladier / gladier-xpcs

MIT License
2 stars 4 forks source link

Rapid Processing of XPCS Data

This repository contains code for processing data from X-ray Photon Correlation Spectroscopy (XPCS) experiments. The code is used extensively at the 8-ID beamline of the Advanced Photon Source (APS), but is broadly applicable.

The code leverages the Gladier Toolkit to use the Globus Flows service for rapid data processing on high-performance computing (HPC) systems and for publication of processed results to a Globus Search catalog to permit subsequent search, browsing, and download:

Online Processing

Online processing consists of a Gladier flow run on the talc machine. The core flow is located at gladier_xpcs/flow_boost.py A script for running the flow with input can be found in scripts/xpcs_online_boost_client.py. In order to run the previous script, a user needs access to ALCF HPC resources with a running globus-compute-endpoint. We track user globus-compute-endpoints through "deployments", which can be found in gladier_xpcs/deployments.py.

The gladier_xpcs/flows/flow_eigen.py program uses the Gladier Toolkit to define a flow with the following sequence of Transfer, Compute, and Search actions:

  1. Transfer experiment data file from instrument to HPC (tool gladier_xpcs/tools/transfer_from_clutch_to_theta.py)
  2. Compute task to extract metadata from experiment data file (tool gladier_xpcs/tools/pre_publish.py)
  3. Transfer metadata to persistent storage (also tool gladier_xpcs/tools/pre_publish.py)
  4. Search task to load metadata into catalog (also tool gladier_xpcs/tools/pre_publish.py)
  5. Compute task to preallocate nodes on HPC resource (tool gladier_xpcs/tools/acquire_nodes.py)
  6. Compute task to run XPCS Boost correlation analysis function on data (tool gladier_xpcs/tools/eigen_corr.py)
  7. Compute task to create correlation plots (tool gladier_xpcs/tools/plot.py)
  8. Compute task to extract metadata from correlation plots (tool gladier_xpcs/tools/gather_xpcs_metadata.py)
  9. Compute task to aggregate new data, metadata for publication (tool gladier_xpcs/tools/publish.py)
  10. Transfer data+metadata to repository (also tool gladier_xpcs/tools/publish.py)
  11. Search task to add metadata+data references to catalog (also tool gladier_xpcs/tools/publish.py)

A script scripts/xpcs_corr_client.py can be used to run the flow with specified inputs.

The flow's Compute tasks involve both simple data manipulations (e.g., metadata extraction) and compute-intensive computations (XPCS Boost). On an HPC system, the former may be run on a "non-compute" (front-end) node, while the latter must be submitted via a scheduler to run on a "compute" node (ideally GPU-enabled). To this end, the flow dispatches each task to the compute_endpoint_non_compute or compute_endpoint_compute globus compute endpoint, respectively, as defined in gladier_xpcs/deployments.py.

Details on how to run the online processing script on an APS beamline computer, talc, are provided on a separate page.

Reprocessing

Note: Reprocessing is a development feature, and is not enabed for production use.

XPCS Reprocessing takes data already published in the portal and re-runs it on corr with a customized (with a qmap file) hdf file. Reprocessing also has an extra step to rename the dataset to publish it under a different title in the portal.

Although scripts exist here to test the reprocessing flow, the actual production flow is deployed separately on the portal. The portal installs the gladier_xpcs package and imports the Gladier Client.

The main reprocessing client is at gladier_xpcs/client_reprocess.py. A script for testing reprocessing is located at scripts/xpcs_reproc_client.py. Reprocessing shares some tools with the online processing flow, but contains a handful of custom tools under gladier_xpcs/reprocessing_tools.

Running The Reprocessing Flow

You need to setup your deployment on Theta before you can run reprocessing. This includes setting up:

Make sure you are also in the XPCS Developers Globus group to access XPCS datasets which have already been published.

To test a reprocessing flow, run the following:

cd scripts/
python xpcs_reproc_client.py

ALCF Configuration

Hopefully, this document is a little outdated and you're executing on Polaris! Please add, update, or correct information as things change.

Environment Setup

  conda create -n gladier-xpcs
  conda activate gladier-xpcs

  # Used for running Boost Corr
  conda install pytorch==1.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
  pip install -e git+https://github.com/AZjk/boost_corr#egg=boost_corr

  # Used for managing compute nodes
  pip install globus-compute-endpoint

Example Config

engine:
    type: HighThroughputEngine
    max_workers_per_node: 1
    worker_debug: False

    address:
        type: address_by_interface
        ifname: vlan2360

    provider:
        type: CobaltProvider
        queue: debug-flat-quad

        # Specify the account/allocation to which jobs should be charged
        account: {{ YOUR_THETA_ALLOCATION }}

        launcher:
            type: AprunLauncher
            overrides: -d 64

        # string to prepend to #COBALT blocks in the submit
        # script to the scheduler
        # eg: "#COBALT -t 50"
        scheduler_options: {{ OPTIONS }}

        # Command to be run before starting a worker
        # e.g., "module load Anaconda; source activate compute_env"
        worker_init: {{ COMMAND }}

        # Scale between 0-1 blocks with 2 nodes per block
        nodes_per_block: 2
        init_blocks: 0
        min_blocks: 0
        max_blocks: 1

        # Hold blocks for 30 minutes
        walltime: 00:30:00