NIEHS / chopin

Scalable GIS methods for environmental and climate data analysis
https://niehs.github.io/chopin/
Other
6 stars 2 forks source link

Understanding SLURM clusters and R job submission test #6

Closed sigmafelix closed 10 months ago

sigmafelix commented 10 months ago

Objective

List of tasks and timeline

sigmafelix commented 10 months ago

future.batchtools seems to provide hassle-free solutions to submit jobs in r-future in HPC. Although learning low-level controls in SLURM must be helpful, I switch the short-term development target to employ future.batchtools to streamline development processes. #7 is also affected.

sigmafelix commented 10 months ago

09/15/2023 update

library(terra)
library(future.apply)
library(scomps)
library(dplyr)
us_extent = 
    terra::ext(c(xmin = -129.5, xmax = -61.1, ymin = 19.5, ymax = 51.8))

extract_with_buffer.flat <- function(
        points, surf, radius, id, qsegs, func = mean, kernel = NULL, bandwidth = NULL
    ) {
    # generate buffers
    bufs = terra::buffer(points, width = radius, quadsegs = qsegs)
    # crop raster
    bufs_extent = terra::ext(bufs)
    surf_cropped = terra::crop(surf, bufs_extent)
    name_surf_val = names(surf)
    # extract raster values
    surf_at_bufs = terra::extract(surf_cropped, bufs)
    surf_at_bufs_summary = 
        surf_at_bufs |> 
            group_by(ID) |> 
            summarize(across(all_of(name_surf_val), ~mean(.x, na.rm = TRUE))) |> 
            ungroup()
    return(surf_at_bufs_summary)
}

pathappend = "/ddn/gs1/home/songi2/projects/Scalable_GIS/largedata/"
merraname = "MERRA2_400.tavg1_2d_aer_Nx.20220820.nc4"
pointname = "aqs-test-data.gpkg"
merra = terra::rast(paste(pathappend, merraname, sep= ""), win = us_extent)
point = terra::vect(paste(pathappend, pointname, sep = ""))

timeindex = rep(seq_along(varnames(merra)), each = 24)
merra_daily = terra::tapp(merra, index = timeindex, fun = mean)
merra_daily
names(merra_daily) = varnames(merra)

merra_daily

targ_cols = c("BCCMASS",
"BCSMASS",
"DMSCMASS",
"DMSSMASS",
"DUCMASS",
"DUSMASS",
"DUCMASS25",
"DUSMASS25",
"OCCMASS",
"OCSMASS",
"SO2CMASS",
"SO2SMASS",
"SO4CMASS",
"SO4SMASS",
"SSCMASS",
"SSSMASS",
"SSCMASS25",
"SSSMASS25")

merra_daily_t = merra_daily[[targ_cols]]

extracted = extract_with_buffer.flat(point, merra_daily_t, "ID.Code", 
    radius = 2e4L, qsegs = 90L)
write.csv(extracted, "res_merra_09152023.csv")
sigmafelix commented 10 months ago

As we communicated in the email chain, terra is unavailable in the HPC for the time being. Next submission is TBD. In the meantime, I will test how full sf-stars implementation would perform in the HPC, even though the local test was not very promising.

sigmafelix commented 10 months ago

From experience for a couple of days, I learned that the parallelization in HPC needs to be performed with a strategy. The basic notion is to run calculation codes on containers with all required packages. What I parallelize at should be determined 1) data size, 2) memory pressure, 3) number of features/cells used per loop/list element. Test run is completed, so I close this issue now. Will open a new issue if there is any HPC-related coming up.