This repository contains code for processing data from X-ray Photon Correlation Spectroscopy (XPCS) experiments. The code is used extensively at the 8-ID beamline of the Advanced Photon Source (APS), but is broadly applicable.
The code leverages the Gladier Toolkit to use the Globus Flows service for rapid data processing on high-performance computing (HPC) systems and for publication of processed results to a Globus Search catalog to permit subsequent search, browsing, and download:
The gladier_xpcs/
directory contains files related to both online processing of data as it is generated by the XPCS instrument and offline reprocessing of previously generated data. In particular:
The gladier_xpcs/flows/flow_eigen.py
program implements an Online Processing flow, designed to be invoked (e.g., on a machine at an XPCS beamline, when new data are generated) for each new batch of XPCS data.
The gladier_xpcs/flows/flow_reprocess.py
program implements a Reprocessing flow, for reprocessing data after it has been generated.
The xpcs_portal/
directory cobtains code relating to the interactive, Globus Search-based portal, that provides for visualizing the results from successful XPCS flows and for starting reprocessing flows for datasets published to the portal. See the Portal README for more information on running the portal.
Online processing consists of a Gladier flow run on the talc machine. The core
flow is located at gladier_xpcs/flow_boost.py
A script for running the flow with
input can be found in scripts/xpcs_online_boost_client.py
. In order to run the previous
script, a user needs access to ALCF HPC resources with a running globus-compute-endpoint.
We track user globus-compute-endpoints through "deployments", which can be found in
gladier_xpcs/deployments.py
.
The gladier_xpcs/flows/flow_eigen.py
program uses the Gladier Toolkit to define a flow with the following sequence of Transfer, Compute, and Search actions:
gladier_xpcs/tools/transfer_from_clutch_to_theta.py
)gladier_xpcs/tools/pre_publish.py
)gladier_xpcs/tools/pre_publish.py
)gladier_xpcs/tools/pre_publish.py
)gladier_xpcs/tools/acquire_nodes.py
)gladier_xpcs/tools/eigen_corr.py
)gladier_xpcs/tools/plot.py
)gladier_xpcs/tools/gather_xpcs_metadata.py
)gladier_xpcs/tools/publish.py
)gladier_xpcs/tools/publish.py
)gladier_xpcs/tools/publish.py
)A script scripts/xpcs_corr_client.py
can be used to run the flow with specified inputs.
The flow's Compute tasks involve both simple data manipulations (e.g., metadata extraction) and compute-intensive computations (XPCS Boost). On an HPC system, the former may be run on a "non-compute" (front-end) node, while the latter must be submitted via a scheduler to run on a "compute" node (ideally GPU-enabled). To this end, the flow dispatches each task to the compute_endpoint_non_compute
or compute_endpoint_compute
globus compute endpoint, respectively, as defined in gladier_xpcs/deployments.py
.
Details on how to run the online processing script on an APS beamline computer, talc, are provided on a separate page.
Note: Reprocessing is a development feature, and is not enabed for production use.
XPCS Reprocessing takes data already published in the portal and re-runs it on corr with a customized (with a qmap file) hdf file. Reprocessing also has an extra step to rename the dataset to publish it under a different title in the portal.
Although scripts exist here to test the reprocessing flow, the actual production flow is
deployed separately on the portal. The portal installs the gladier_xpcs
package and
imports the Gladier Client.
The main reprocessing client is at gladier_xpcs/client_reprocess.py
. A script for
testing reprocessing is located at scripts/xpcs_reproc_client.py
. Reprocessing
shares some tools with the online processing flow, but contains a handful of custom
tools under gladier_xpcs/reprocessing_tools
.
You need to setup your deployment on Theta before you can run reprocessing. This includes setting up:
Make sure you are also in the XPCS Developers Globus group to access XPCS datasets which have already been published.
To test a reprocessing flow, run the following:
cd scripts/
python xpcs_reproc_client.py
Hopefully, this document is a little outdated and you're executing on Polaris! Please add, update, or correct information as things change.
conda create -n gladier-xpcs
conda activate gladier-xpcs
# Used for running Boost Corr
conda install pytorch==1.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -e git+https://github.com/AZjk/boost_corr#egg=boost_corr
# Used for managing compute nodes
pip install globus-compute-endpoint
engine:
type: HighThroughputEngine
max_workers_per_node: 1
worker_debug: False
address:
type: address_by_interface
ifname: vlan2360
provider:
type: CobaltProvider
queue: debug-flat-quad
# Specify the account/allocation to which jobs should be charged
account: {{ YOUR_THETA_ALLOCATION }}
launcher:
type: AprunLauncher
overrides: -d 64
# string to prepend to #COBALT blocks in the submit
# script to the scheduler
# eg: "#COBALT -t 50"
scheduler_options: {{ OPTIONS }}
# Command to be run before starting a worker
# e.g., "module load Anaconda; source activate compute_env"
worker_init: {{ COMMAND }}
# Scale between 0-1 blocks with 2 nodes per block
nodes_per_block: 2
init_blocks: 0
min_blocks: 0
max_blocks: 1
# Hold blocks for 30 minutes
walltime: 00:30:00