Thomas-Moore-Creative / Climatology-generator-demo

A demonstration / MVP to show how one could build an "interactive" climatology & compositing tool on Gadi HPC.
MIT License
0 stars 0 forks source link

Generate the following climatological & composited statistics #2

Open Thomas-Moore-Creative opened 9 months ago

Thomas-Moore-Creative commented 9 months ago

Monthly climatologies for:

Over BRAN2020 variables:

Generating these statistics:

workflow:

delivery

Thomas-Moore-Creative commented 9 months ago

Climate Mode definition:

There is detail and nuance in this choice but for this task we'll for now define:

Using ONI:

ONI (5N-5S, 170W-120W): The ONI uses the same region as the Niño 3.4 index. The ONI uses a 3-month running mean, and to be classified as a full-fledged El Niño or La Niña, the anomalies must exceed +0.5C or -0.5C for at least five consecutive months. This is the operational definition used by NOAA.

References:

Repos: make NCAR's ONI index analysis ready locally #1

NCAR

Nino SST Indices (Nino 1+2, 3, 3.4, 4; ONI and TNI)

NOAA Climate explainers:

El Niño Southern Oscillation (ENSO)

Multivariate ENSO Index Version 2 (MEI.v2)

Why are there so many ENSO indexes, instead of just one?

Exactly the same, but completely different: why we have so many different ways of looking at sea surface temperature

Conversation: El Niño combined with global warming means big changes for New Zealand’s weather

What North America can expect from El Niño

Peer-reviewed publications:

Huang, B., M. L. L’Heureux, J. Lawrimore, C.-L. Liu, H.-M. Zhang, V. Banzon, Z.-Z. Hu, and A. Kumar, 2013: Why did large differences arise in the sea surface temperature datasets across the tropical Pacific during 2012?. J. Atmos. Oceanic Technol., 30, 2944–2953.

Observing and Predicting the 2015/16 El Niño

The Role of Buoy and Argo Observations in Two SST Analyses in the Global and Tropical Pacific Oceans

Thomas-Moore-Creative commented 9 months ago

CleanShot 2023-11-30 at 12 08 05

Thomas-Moore-Creative commented 9 months ago

Todo:

CleanShot 2023-11-30 at 15 50 23

Thomas-Moore-Creative commented 9 months ago

CleanShot 2023-12-01 at 11 42 08

Starting basic composites - means over all event-months.

Thomas-Moore-Creative commented 9 months ago

count of events in BRAN2020 period (360 months)

El Nino = 8 events ( weak - strong ) as defined by ONI La Nina = 10 events ( weak - strong ) as defined by ONI El Nino months total = 70 = 19% La Nina months total = 119 = 33% Neutral months total = 171 = 48%

Thomas-Moore-Creative commented 9 months ago

CleanShot 2023-12-04 at 16 48 37 Under 8 minutes for the write of the 3 climatologies

Thomas-Moore-Creative commented 9 months ago

Thanks to an idea from @ChrisC28 I'm going to compare the performance of using .select([list of times]) vs .where(Bool_mask) for the compositing masking.

Thomas-Moore-Creative commented 9 months ago

A single quantile calculation using: El_Nino_mask_TIMES = El_Nino_mask01['Time'].where(El_Nino_mask,drop=True) result = temp_chunked_time.sel({'Time':El_Nino_mask_TIMES.values}).quantile([0.05],skipna=True,dim='Time').compute() takes -

CPU times: user 12min 9s, sys: 1min 43s, total: 13min 53s
Wall time: 2h 18min 6s
Thomas-Moore-Creative commented 9 months ago

.quantile([0.05],skipna=**False**,dim='Time') fixed the unresponsive calculation. Apparently numpy .nanquantile is 100x slow?!?

Also using .sel on a list of masked and dropped times is 2x faster than pure .where

Thomas-Moore-Creative commented 9 months ago

30GB "stats" ncfile writing out with a rechunk very effectively . . . max memory at about 50GB CleanShot 2023-12-11 at 11 12 27

Thomas-Moore-Creative commented 9 months ago

CleanShot 2023-12-11 at 11 27 28 6.5 minutes for climatologies and 21.5 minutes for stats for a single variable

If we plan on "less than 45 minutes" per batched variable then total time, once resources are available, should be less than 5 hours.

Thomas-Moore-Creative commented 9 months ago

How can I ensure I control chunk sizes in the output NetCDF files?

ds.to_netcdf("/tmp/test1.nc", encoding={"z": {"chunksizes": (5, 5)}}) encoding?

Thomas-Moore-Creative commented 9 months ago

CleanShot 2023-12-11 at 14 42 14 BRAN2020_temperature_climatology.to_netcdf(write_path+'BRAN2020_temperature_climatology.nc',format='netCDF4') yields an NC file with "contiguous" storage:

float La_Nina_climatological_temp(month, st_ocean, yt_ocean, xt_ocean) ;
        La_Nina_climatological_temp:_FillValue = NaNf ;
        La_Nina_climatological_temp:cell_methods = "time: mean Time: mean" ;
        La_Nina_climatological_temp:coordinates = "geolon_t geolat_t" ;
        La_Nina_climatological_temp:long_name = "Potential temperature" ;
        La_Nina_climatological_temp:packing = 4LL ;
        La_Nina_climatological_temp:standard_name = "sea_water_potential_temperature" ;
        La_Nina_climatological_temp:time_avg_info = "average_T1,average_T2,average_DT" ;
        La_Nina_climatological_temp:units = "degrees C" ;
        La_Nina_climatological_temp:valid_range = -32767LL, 32767LL ;
        La_Nina_climatological_temp:_Storage = "contiguous" ;
        La_Nina_climatological_temp:_Endianness = "little" ;
Thomas-Moore-Creative commented 9 months ago

hurdle - encoding the chunks for writing to NetCDF

settings = dict(chunksizes={'month':1,'st_ocean':10, 'yt_ocean':1500, 'xt_ocean':3600})
encoding = {var: settings for var in BRAN2020_temperature_climatology.data_vars}
BRAN2020_temperature_climatology.to_netcdf(write_path+'BRAN2020_temperature_climatology.nc',encoding = encoding)

is yielding error:

File /g/data/v14/tm4888/miniconda3/envs/busecke_etal_grl_2019_omz_euc/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:495, in NetCDF4DataStore.prepare_variable(self, name, variable, check_encoding, unlimited_dims)
    493     nc4_var = self.ds.variables[name]
    494 else:
--> 495     nc4_var = self.ds.createVariable(
    496         varname=name,
    497         datatype=datatype,
    498         dimensions=variable.dims,
    499         zlib=encoding.get("zlib", False),
    500         complevel=encoding.get("complevel", 4),
    501         shuffle=encoding.get("shuffle", True),
    502         fletcher32=encoding.get("fletcher32", False),
    503         contiguous=encoding.get("contiguous", False),
    504         chunksizes=encoding.get("chunksizes"),
    505         endian="native",
    506         least_significant_digit=encoding.get("least_significant_digit"),
    507         fill_value=fill_value,
    508     )
    510 nc4_var.setncatts(attrs)
    512 target = NetCDF4ArrayWrapper(name, self)

File src/netCDF4/_netCDF4.pyx:2838, in netCDF4._netCDF4.Dataset.createVariable()

File src/netCDF4/_netCDF4.pyx:4085, in netCDF4._netCDF4.Variable.__init__()

KeyError: 0

where: encoding =

{'climatological_temp': {'chunksizes': {'month': 1,
   'st_ocean': 10,
   'yt_ocean': 1500,
   'xt_ocean': 3600}},
 'El_Nino_climatological_temp': {'chunksizes': {'month': 1,
   'st_ocean': 10,
   'yt_ocean': 1500,
   'xt_ocean': 3600}},
 'La_Nina_climatological_temp': {'chunksizes': {'month': 1,
   'st_ocean': 10,
   'yt_ocean': 1500,
   'xt_ocean': 3600}}}
Thomas-Moore-Creative commented 9 months ago

ideas

Thomas-Moore-Creative commented 8 months ago

CleanShot 2023-12-14 at 09 51 48

Thomas-Moore-Creative commented 8 months ago

attempt daily temperature data

Thomas-Moore-Creative commented 7 months ago

Approximate volume mean of daily BRAN2020 temperature

CleanShot 2024-02-02 at 10 25 04@2x

Thomas-Moore-Creative commented 7 months ago

ARE memory issues for daily rechunking operation

CleanShot 2024-02-02 at 10 51 12@2x

Thomas-Moore-Creative commented 7 months ago

one hour later

CleanShot 2024-02-02 at 12 03 46@2x

Thomas-Moore-Creative commented 7 months ago

progress includes:

Thomas-Moore-Creative commented 7 months ago

workflow appears practical for daily BRAN2020 3D data.

Tests using 3D daily temp variable show that for all 6 variables:

Thomas-Moore-Creative commented 6 months ago

for some reason 'salt' failed due to being killed because of JobFS? But why is the quota 400 if I requested 800?

Job 110281744.gadi-pbs killed due to exceeding jobfs quota. Quota: 400.0GB, Used: 418.91GB, Host: gadi-mmem-clx-0001 /g/data/es60/users/thomas_moore/miniconda3/envs/pangeo_regionmask/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 50 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

======================================================================================
                  Resource Usage on 2024-03-08 01:31:14:
   Job Id:             110281744.gadi-pbs
   Project:            es60
   Exit Status:        271 (Linux Signal 15 SIGTERM Termination)
   Service Units:      2288.53
   NCPUs Requested:    96                     NCPUs Used: 96
                                           CPU Time Used: 85:05:00
   Memory Requested:   5.86TB                Memory Used: 2.66TB
   Walltime requested: 10:00:00            Walltime Used: 04:46:04
   JobFS requested:    800.0GB                JobFS used: 418.91GB
======================================================================================
Thomas-Moore-Creative commented 5 months ago

I may need to split the tasks up across more jobs

Thomas-Moore-Creative commented 5 months ago

need to add logic to deal with the dimensions for U & V

Thomas-Moore-Creative commented 5 months ago
Job 111169419.gadi-pbs killed due to exceeding jobfs quota. Quota: 1.37TB, Used: 1.38TB, Host: gadi-mmem-clx-0002

======================================================================================
                  Resource Usage on 2024-03-18 22:25:36:
   Job Id:             111169419.gadi-pbs
   Project:            es60
   Exit Status:        271 (Linux Signal 15 SIGTERM Termination)
   Service Units:      1233.33
   NCPUs Requested:    96                     NCPUs Used: 96
                                           CPU Time Used: 50:36:38
   Memory Requested:   5.84TB                Memory Used: 2.71TB
   Walltime requested: 24:00:00            Walltime Used: 02:34:10
   JobFS requested:    2.73TB                 JobFS used: 1.38TB
======================================================================================

intermittent networking issues between nodes appears to cause challenges with two nodes? ???

Thomas-Moore-Creative commented 5 months ago

JobFS spill problem continues

======================================================================================
                  Resource Usage on 2024-03-20 12:39:53:
   Job Id:             111301310.gadi-pbs
   Project:            es60
   Exit Status:        271 (Linux Signal 15 SIGTERM Termination)
   Service Units:      582.47
   NCPUs Requested:    48                     NCPUs Used: 48              
                                           CPU Time Used: 48:11:05        
   Memory Requested:   2.92TB                Memory Used: 2.67TB          
   Walltime requested: 24:00:00            Walltime Used: 02:25:37        
   JobFS requested:    1.37TB                 JobFS used: 1.38TB          
======================================================================================