Thomas-Moore-Creative / Climatology-generator-demo

A demonstration / MVP to show how one could build an "interactive" climatology & compositing tool on Gadi HPC.
MIT License
0 stars 0 forks source link

fix bug and re-run all 'neutral' calculations #23

Closed Thomas-Moore-Creative closed 3 months ago

Thomas-Moore-Creative commented 3 months ago
          # bug found! 

CleanShot 2024-05-21 at 17 49 01@2x

Action: fix bug and re-run all 'neutral' calculations

Originally posted by @Thomas-Moore-Creative in https://github.com/Thomas-Moore-Creative/Climatology-generator-demo/issues/22#issuecomment-2121984961

Thomas-Moore-Creative commented 3 months ago

CleanShot 2024-05-21 at 19 32 38@2x

Thomas-Moore-Creative commented 3 months ago

re-run neutral mask for base & quant across variables:

as of 10:16pm 21 May 2024

v
(base) tm4888@gadi-login-04 /g/data/es60/users/thomas_moore/code/Climatology-generator-demo/src/scripts (main) qstat
Job id                 Name             User              Time Use S Queue
---------------------  ---------------- ----------------  -------- - -----
116202890.gadi-pbs     run_bran_stats.* tm4888            00:00:00 R megamem-exec

u & v
(base) tm4888@gadi-login-04 /g/data/es60/users/thomas_moore/code/Climatology-generator-demo/src/scripts (main) qstat
Job id                 Name             User              Time Use S Queue
---------------------  ---------------- ----------------  -------- - -----
116202890.gadi-pbs     run_bran_stats.* tm4888            01:13:57 R megamem-exec
116202981.gadi-pbs     run_bran_stats.* tm4888            00:00:00 R megamem-exec

u,v, &salt
(base) tm4888@gadi-login-04 /g/data/es60/users/thomas_moore/code/Climatology-generator-demo/src/scripts (main) qstat
Job id                 Name             User              Time Use S Queue
---------------------  ---------------- ----------------  -------- - -----
116202890.gadi-pbs     run_bran_stats.* tm4888            02:49:34 R megamem-exec
116202981.gadi-pbs     run_bran_stats.* tm4888            00:57:55 R megamem-exec
116203106.gadi-pbs     run_bran_stats.* tm4888                   0 Q megamem-exec

6:30 am adding temp with u finished

(base) tm4888@gadi-login-08 /g/data/es60/users/thomas_moore/code/Climatology-generator-demo/src/scripts (main) qstat
Job id                 Name             User              Time Use S Queue
---------------------  ---------------- ----------------  -------- - -----
116202890.gadi-pbs     run_bran_stats.* tm4888            380:45:* R megamem-exec
116203106.gadi-pbs     run_bran_stats.* tm4888            159:31:* R megamem-exec
116218306.gadi-pbs     run_bran_stats.* tm4888            02:08:43 R megamem-exec

mld = 116218548.gadi-pbs

Thomas-Moore-Creative commented 3 months ago
PBS Job Id: 116202890.gadi-pbs
Job Name:   run_bran_stats.sh
**Aborted by PBS Server**
Job exceeded resource walltime
See job standard error file

This was "v" . . . so likely the base is done but quant needs to be re-written?

EDIT: 10:59am v somehow didn't run quickly? re-run all v needed

run_bran_stats.sh.o116202890
======================================================================================
                  Resource Usage on 2024-05-22 10:20:32:
   Job Id:             116202890.gadi-pbs
   Project:            es60
   Exit Status:        -29 (Job failed due to exceeding walltime)
   Service Units:      2887.93
   NCPUs Requested:    48                     NCPUs Used: 48
                                           CPU Time Used: 562:24:43
   Memory Requested:   2.92TB                Memory Used: 868.33GB
   Walltime requested: 12:00:00            Walltime Used: 12:01:59
   JobFS requested:    1.37TB                 JobFS used: 0B
======================================================================================
(base) tm4888@gadi-login-01 /g/data/es60/users/thomas_moore/code/Climatology-generator-demo/src/scripts/logs (main) more 116202890.gadi-pbs-run-bran-v-neutral.log
importing functions ...
>>> config: {'variable': 'v', 'zarr_path_dict': {'temp': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/looped_rechunk_output/temp/temp_combined_output.zarr', 'salt': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/lo
oped_rechunk_output/salt/salt_combined_output.zarr', 'u': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/looped_rechunk_output/u/u_combined_output.zarr', 'v': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/looped_rec
hunk_output/v/v_combined_output.zarr', 'mld': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/BRAN2020-daily-mld.chunks.Time-1.xt_ocean30.yt_ocean30.2024.05.20.11.39.46.zarr', 'eta_t': '/scratch/es60/ard/reanalysi
s/BRAN2020/ARD/BRAN2020-daily-eta_t.chunks.Time-1.xt_ocean30.yt_ocean30.2024.05.20.11.49.05.zarr'}, 'write_results_base_dir': '/g/data/es60/users/thomas_moore/clim_demo_results/daily/bran2020_intermediate_res
ults/new_neutral_files/', 'n_workers': 48, 'threads_per_worker': 1, 'memory_limit': '60GB', 'run_base_stats': True, 'run_quant': True, 'run_all_time': False, 'run_neutral': True, 'run_la_nina': False, 'run_el
_nino': False, 'lat_name_dict': {'temp': 'yt_ocean', 'salt': 'yt_ocean', 'u': 'yu_ocean', 'v': 'yu_ocean', 'mld': 'yt_ocean', 'eta_t': 'yt_ocean'}, 'lon_name_dict': {'temp': 'xt_ocean', 'salt': 'xt_ocean', 'u
': 'xu_ocean', 'v': 'xu_ocean', 'mld': 'xt_ocean', 'eta_t': 'xt_ocean'}, 'time_name': 'Time'}
variable requested: v
>>> zarr_path_dict: {'temp': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/looped_rechunk_output/temp/temp_combined_output.zarr', 'salt': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/looped_rechunk_output/salt/sal
t_combined_output.zarr', 'u': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/looped_rechunk_output/u/u_combined_output.zarr', 'v': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/looped_rechunk_output/v/v_combined_out
put.zarr', 'mld': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/BRAN2020-daily-mld.chunks.Time-1.xt_ocean30.yt_ocean30.2024.05.20.11.39.46.zarr', 'eta_t': '/scratch/es60/ard/reanalysis/BRAN2020/ARD/BRAN2020-dail
y-eta_t.chunks.Time-1.xt_ocean30.yt_ocean30.2024.05.20.11.49.05.zarr'}
>>> write_results_base_dir: /g/data/es60/users/thomas_moore/clim_demo_results/daily/bran2020_intermediate_results/new_neutral_files/
>>> n_workers: 48
>>> threads_per_worker: 1
>>> memory_limit: 60GB
>>> run_base_stats: True
>>> run_quant: True
>>> run_all_time: False
>>> run_neutral: True
>>> run_la_nina: False
>>> run_el_nino: False
>>> lat_name_dict: {'temp': 'yt_ocean', 'salt': 'yt_ocean', 'u': 'yu_ocean', 'v': 'yu_ocean', 'mld': 'yt_ocean', 'eta_t': 'yt_ocean'}
>>> lon_name_dict: {'temp': 'xt_ocean', 'salt': 'xt_ocean', 'u': 'xu_ocean', 'v': 'xu_ocean', 'mld': 'xt_ocean', 'eta_t': 'xt_ocean'}
>>> time_name: Time
>>> Spinning up a dask cluster...
<Client: 'tcp://127.0.0.1:34995' processes=48 threads=48, memory=2.62 TiB>
timestamp: 2024.05.21.22.18.39
>>> building ENSO dataframe ...
/g/data/es60/users/thomas_moore/code/Climatology-generator-demo/src/scripts/../run_bran_stats.py:190: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ONI_DF_BRANtime['El Nino LOGICAL'] = ONI_DF_BRANtime['El Nino'].notnull()
/g/data/es60/users/thomas_moore/code/Climatology-generator-demo/src/scripts/../run_bran_stats.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ONI_DF_BRANtime['La Nina LOGICAL'] = ONI_DF_BRANtime['La Nina'].notnull()
/g/data/es60/users/thomas_moore/code/Climatology-generator-demo/src/scripts/../run_bran_stats.py:195: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ONI_DF_BRANtime.loc[pd.to_datetime('2024-01-01 00:00:00')] = 'NaN'
>>> loading zarr collection for ...v
>>> sorted depths
[2.50000000e+00 7.50000000e+00 1.25000000e+01 1.75153904e+01
 2.26670208e+01 2.81693802e+01 3.42180061e+01 4.09549751e+01
 4.84549751e+01 5.67180061e+01 6.56693802e+01 7.51670227e+01
 8.50153885e+01 9.50000000e+01 1.05000000e+02 1.15000000e+02
 1.25000000e+02 1.35000000e+02 1.45000000e+02 1.55000000e+02
 1.65000000e+02 1.75000000e+02 1.85000000e+02 1.95000000e+02
 2.05189896e+02 2.17054489e+02 2.33194321e+02 2.55884232e+02
 2.86608978e+02 3.25884216e+02 3.73194336e+02 4.27054474e+02
 4.85189911e+02 5.45511108e+02 6.10415649e+02 6.85926758e+02
 7.75926758e+02 8.80415649e+02 9.95511108e+02 1.11531335e+03
 1.23835388e+03 1.36815747e+03 1.50773389e+03 1.65815747e+03
 1.81835388e+03 1.98531335e+03 2.16518018e+03 2.43110107e+03
 2.89484180e+03 3.60310107e+03 4.50918018e+03]
>>>> chunks going into quant calculation
st_ocean chunks: (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
Time chunks: (11322,)
yu_ocean chunks: (120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 60)
xu_ocean chunks: (120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120, 120)
>>> running neutral phase...
>>> running base stats ...
writing to the base stats netcdf file for neutral phase: v ....
/g/data/es60/users/thomas_moore/miniconda3/envs/pangeo_bran2020_demo/lib/python3.10/site-packages/distributed/client.py:3157: UserWarning: Sending large graph of size 12.64 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  warnings.warn(
/g/data/es60/users/thomas_moore/miniconda3/envs/pangeo_bran2020_demo/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 288 leaked semaphore objects to c
lean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

V was just hung up for some unknown reason? No communications errors?

Thomas-Moore-Creative commented 3 months ago

116229300.gadi-pbs run_bran_stats.* tm4888 04:44:41 R megamem-exec = eta_t

Job id                 Name             User              Time Use S Queue
---------------------  ---------------- ----------------  -------- - -----
116218306.gadi-pbs     run_bran_stats.* tm4888            211:32:* R megamem-exec
116229300.gadi-pbs     run_bran_stats.* tm4888            05:50:06 R megamem-exec
116230340.gadi-pbs     run_bran_stats.* tm4888                   0 Q megamem-exec

116230340.gadi-pbs = v

Thomas-Moore-Creative commented 3 months ago

done