Understanding SLURM clusters and R job submission test

sigmafelix commented 10 months ago

Objective

The NIEHS HPC is managing computational demands by SLURM. I will explore SLURM in general and prepare internal materials for job submissions to the NIEHS HPC using R.
Learning existing package: rslurm
Designing an efficient workflow to distribute spatiotemporal (covariate) computation tasks across the assigned computational assets using SLURM

List of tasks and timeline

[x] As per #3 , an array of essential functions will be prepared (around 09/05)
[x] If possible, try setting up a SLURM locally and practice submitting jobs to the management system
[x] ~Prepare short hands-on examples for team members~
[x] Actual test on the NIEHS HPC and organize a practice session

sigmafelix commented 10 months ago

future.batchtools seems to provide hassle-free solutions to submit jobs in r-future in HPC. Although learning low-level controls in SLURM must be helpful, I switch the short-term development target to employ future.batchtools to streamline development processes. #7 is also affected.

sigmafelix commented 10 months ago

09/15/2023 update

Tried a simple script to extract values with 20 kilometers buffers and cropped MERRA2 file

library(terra)
library(future.apply)
library(scomps)
library(dplyr)
us_extent = 
    terra::ext(c(xmin = -129.5, xmax = -61.1, ymin = 19.5, ymax = 51.8))

extract_with_buffer.flat <- function(
        points, surf, radius, id, qsegs, func = mean, kernel = NULL, bandwidth = NULL
    ) {
    # generate buffers
    bufs = terra::buffer(points, width = radius, quadsegs = qsegs)
    # crop raster
    bufs_extent = terra::ext(bufs)
    surf_cropped = terra::crop(surf, bufs_extent)
    name_surf_val = names(surf)
    # extract raster values
    surf_at_bufs = terra::extract(surf_cropped, bufs)
    surf_at_bufs_summary = 
        surf_at_bufs |> 
            group_by(ID) |> 
            summarize(across(all_of(name_surf_val), ~mean(.x, na.rm = TRUE))) |> 
            ungroup()
    return(surf_at_bufs_summary)
}

pathappend = "/ddn/gs1/home/songi2/projects/Scalable_GIS/largedata/"
merraname = "MERRA2_400.tavg1_2d_aer_Nx.20220820.nc4"
pointname = "aqs-test-data.gpkg"
merra = terra::rast(paste(pathappend, merraname, sep= ""), win = us_extent)
point = terra::vect(paste(pathappend, pointname, sep = ""))

timeindex = rep(seq_along(varnames(merra)), each = 24)
merra_daily = terra::tapp(merra, index = timeindex, fun = mean)
merra_daily
names(merra_daily) = varnames(merra)

merra_daily

targ_cols = c("BCCMASS",
"BCSMASS",
"DMSCMASS",
"DMSSMASS",
"DUCMASS",
"DUSMASS",
"DUCMASS25",
"DUSMASS25",
"OCCMASS",
"OCSMASS",
"SO2CMASS",
"SO2SMASS",
"SO4CMASS",
"SO4SMASS",
"SSCMASS",
"SSSMASS",
"SSCMASS25",
"SSSMASS25")

merra_daily_t = merra_daily[[targ_cols]]

extracted = extract_with_buffer.flat(point, merra_daily_t, "ID.Code", 
    radius = 2e4L, qsegs = 90L)
write.csv(extracted, "res_merra_09152023.csv")

In SSH, sbatch terra_runs_Rcode_file.sh returned an error message of:

sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
Will ask OSC about HPC job submission authorization

sigmafelix commented 10 months ago

As we communicated in the email chain, terra is unavailable in the HPC for the time being. Next submission is TBD. In the meantime, I will test how full sf-stars implementation would perform in the HPC, even though the local test was not very promising.

sigmafelix commented 10 months ago

From experience for a couple of days, I learned that the parallelization in HPC needs to be performed with a strategy. The basic notion is to run calculation codes on containers with all required packages. What I parallelize at should be determined 1) data size, 2) memory pressure, 3) number of features/cells used per loop/list element. Test run is completed, so I close this issue now. Will open a new issue if there is any HPC-related coming up.

NIEHS / chopin