ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Preprocessor taking very long time processing large data files like ocean data #774

Closed rswamina closed 4 years ago

rswamina commented 4 years ago

I encountered a problem when processing large data sets (e.g. ocean variable like thetao and pi-ctrl data running to several hundered years). Despite ordering preprocessors in what I thought would be an efficient way to process the data, the program either hangs/runs out of memory. I reduced the number of years to a very small number and did manage to complete the preprocessing. I ran these on JASMIN sci servers (anything outside of sci3/cems-sci2/sci6 runs out of memory while loading the data). Am attaching my recipe here.

#recipe_thetao_pictrl_process.yml

---
documentation:
  description: |
    Process pi-control data for ocean vars such as thetao

  authors:
    - swaminathan_ranjini

  references:
    - collins13ipcc

  projects:
    - crescendo 

preprocessors:
  preproc_zonal_ocean:
    custom_order: True
    extract_levels: 
      levels: [0., 1000., 2000., 3000., 4000., 5000.,6000.,]
      scheme: linear_horizontal_extrapolate_vertical 
    annual_statistics:
      operator: mean
    regrid:
      target_grid: 2x2
      scheme: linear
    zonal_statistics:
      operator: mean

diagnostics:
  thetao_change_mmm:
    description: Air temperature change for RCPs, periods,
                  as zonal plots.
    themes:
      - phys
    realms:
      - ocean
    variables:
      thetao:
        preprocessor: preproc_zonal_ocean
        project: CMIP6
        mip: Omon

    additional_datasets:
      - {dataset: UKESM1-0-LL, exp: piControl, start_year: 1960, end_year: 1970, grid: gn, ensemble: r1i1p1f2 }
      - {dataset: IPSL-CM6A-LR, exp: piControl, start_year: 1850, end_year: 1860, grid: gn, ensemble: r1i1p1f1 }
      - {dataset: MRI-ESM2-0, exp: piControl,  ensemble: r1i1p1f1,  start_year: 1850,  end_year: 1860, grid: gn}

#      - {dataset: UKESM1-0-LL, exp: piControl, start_year: 1960, end_year: 2459, grid: gn, ensemble: r1i1p1f2 }

#      - {dataset: IPSL-CM6A-LR, exp: piControl, start_year: 1850, end_year: 2349, grid: gn, ensemble: r1i1p1f1 }

#      - {dataset: MRI-ESM2-0, exp: piControl,  ensemble: r1i1p1f1,  start_year: 1850,  end_year: 2349, grid: gn}

    scripts: null
valeriupredoi commented 4 years ago

moved to #775