[BUG] null values at certain index points in target, and pred for `global_forecast` evaluation mode

sbAsma commented 3 weeks ago

global_forecast evaluation mode. Description of the bug There happens to be 0.0 values for the written data arrays pred and target at the index [:, 702:721, 1040:1440]

To Reproduce Steps to reproduce the behavior:

In evaluate.py, choose global_forecast mode, and set the date on 'dates' : [[2021, 2, 10, 12]]
In evaluate.py, choose model_id = '3qou60es' # temperature
Load modules: source ../env_setup/modules_jsc.sh
Activate virtual env: source ../virtual_envs/venv_jwb/bin/activate
Run : srun --label --cpu-bind=v ../virtual_envs/venv_jwb/bin/python -u atmorep/core/evaluate.py
Read and print results/{model_id}/results_{model_id}_epoch00000_pred.zarr and results/{model_id}/results_{model_id}_epoch00000_target.zarr

Expected behavior An unexpectedly high rmse value will rise

Screenshots I read and I printed ERA5, target, and pred (at the end of the evaluation). Here is a sample of the outputs :

====================================== vlevel = 105 =================================================
ERA5[(0, 1439)] = 255.4578094482422
target[(0, 1439)] = 255.4578094482422
pred[(0, 1439)] = 254.625
-----------------------------------------------------------
...
-----------------------------------------------------------
ERA5[(700, 1439)] = 239.8992156982422
target[(700, 1439)] = 239.8992156982422
pred[(700, 1439)] = 238.0
-----------------------------------------------------------
ERA5[(701, 1439)] = 240.0486297607422
target[(701, 1439)] = 240.0486297607422
pred[(701, 1439)] = 238.125
-----------------------------------------------------------
ERA5[(702, 1439)] = 240.2146453857422
target[(702, 1439)] = 0.0
pred[(702, 1439)] = 0.0
-----------------------------------------------------------
ERA5[(703, 1439)] = 240.3953094482422
target[(703, 1439)] = 0.0
pred[(703, 1439)] = 0.0
-----------------------------------------------------------
...
-----------------------------------------------------------
ERA5[(720, 1439)] = 242.1277313232422
target[(720, 1439)] = 0.0
pred[(720, 1439)] = 0.0
-----------------------------------------------------------

Hardware and environment:

System: JSC Juwels-Booster
Runtime: interactive
Special branch: develop
Python Version: 3.11

Additional context ///

iluise commented 3 weeks ago

Hi Asma, can you please follow the template when opening the issues? Which branch are you working on? Do the indices stand for latitude and longitude? and what about time?

sbAsma commented 3 weeks ago

Hi Asma, can you please follow the template when opening the issues? Which branch are you working on? Do the indices stand for latitude and longitude? and what about time?

Apologies, I didn't pay attention. Let me re-write it.

iluise commented 3 weeks ago

Thanks, could you please post here the code you used to get the numbers you showed above?

sbAsma commented 3 weeks ago

Thanks, could you please post here the code you used to get the numbers you showed above?

Here it is:

model_id = 'liol3jdy'
field = 'temperature'
results_path = '/home/belkis/Documents/AtmoRep/AtmoRep_results_visualization/results_gf/'
store_target = zarr.ZipStore( results_path + f'id{model_id}/results_id{model_id}_epoch00000_target.zarr')
ds_target = zarr.group( store=store_target)

store_pred = zarr.ZipStore( results_path + f'id{model_id}/results_id{model_id}_epoch00000_pred.zarr')
ds_pred = zarr.group( store=store_pred)

i = 0

ds_o = xr.Dataset( coords={ 'ml' : ds_target[ f'{field}/sample={i:05d}/ml' ],
                            'datetime': ds_target[ f'{field}/sample={i:05d}/datetime' ][:], 
                            'lat' : np.linspace( -90., 90., num=180*4+1, endpoint=True), # -90., 90.
                            'lon' : np.linspace( 0., 360., num=360*4, endpoint=False) }
                             )
nlevels =  len(ds_target[ f'{field}/sample={i:05d}/ml' ])

ds_o['vo_target'] = (['ml', 'datetime', 'lat', 'lon'], np.zeros( (nlevels, 6, 721, 1440)))
ds_o['vo_pred'] = (['ml', 'datetime', 'lat', 'lon'], np.zeros( (nlevels, 6, 721, 1440)))

for i_str in ds_target[ f'{field}']:
      ds_o['vo_target'].loc[ dict( datetime=ds_target[ f'{field}/{i_str}/datetime' ][:], 
            lat=ds_target[ f'{field}/{i_str}/lat' ],
            lon=ds_target[ f'{field}/{i_str}/lon' ][:]) ] = ds_target[ f'{field}/{i_str}/data' ] 

      ds_o['vo_pred'].loc[ dict( datetime=ds_pred[ f'{field}/{i_str}/datetime' ][:], 
            lat=ds_pred[ f'{field}/{i_str}/lat' ],
            lon=ds_pred[ f'{field}/{i_str}/lon' ][:]) ] = ds_pred[ f'{field}/{i_str}/data' ] 

data_path = '/home/belkis/Documents/AtmoRep/AtmoRep_results_visualization/ERA5/'
source = xr.open_dataset(data_path + "era5_temperature_y2021_m02_ml105.grib", engine='cfgrib',
                              backend_kwargs={'time_dims':('valid_time','indexing_time')})['t']

source_2021_2_9_10 = source[223:229, :, :] # 6 hours 
targets_ml105 = ds_o['vo_target'][1] # ml105
preds_ml105 = ds_o['vo_pred'][1] # ml105

targets_ml105_data_val = np.flip(targets_ml105.values, 1) # ml105 flipped
preds_ml105_data_val = np.flip(preds_ml105.values, 1) # ml105 flipped
era5_data_ml105 = source_2021_2_9_10

print("====================================== vlevel = 105 =================================================")
ii = 5
era5_data_ml105_f32 = np.array(era5_data_ml105[ii] , dtype=np.float32)
targets_ml105_data_val_f32 = np.array(targets_ml105_data_val[ii] , dtype=np.float32)
preds_ml105_data_val_f32 = np.array(preds_ml105_data_val[ii] , dtype=np.float32)
lon_idx = 1439
for lat_idx in range(0, 721): # (702, 721):
      print(f"ERA5[{lat_idx,lon_idx}] = {era5_data_ml105_f32[lat_idx,lon_idx]}")
      print(f"target[{lat_idx,lon_idx}] = {targets_ml105_data_val_f32[lat_idx,lon_idx]}")
      print(f"pred[{lat_idx,lon_idx}] = {preds_ml105_data_val_f32[lat_idx,lon_idx]}")
      print("-----------------------------------------------------------")

sbAsma commented 3 weeks ago

I think it's nice to have these plots as well (same datetime as in the values above) ERA5 target pred

iluise commented 3 weeks ago

Hi asma, I tried running your code and producing your error but everything seems normal to me:

...
-----------------------------------------------------------
ERA5[(712, 1439)] = 242.0623016357422
target[(712, 1439)] = 242.0623016357422
pred[(712, 1439)] = 241.99569702148438
-----------------------------------------------------------
ERA5[(713, 1439)] = 242.1150360107422
target[(713, 1439)] = 242.1150360107422
pred[(713, 1439)] = 242.0474853515625
-----------------------------------------------------------
ERA5[(714, 1439)] = 242.1462860107422
target[(714, 1439)] = 242.1462860107422
pred[(714, 1439)] = 242.10092163085938
-----------------------------------------------------------
ERA5[(715, 1439)] = 242.1658172607422
target[(715, 1439)] = 242.1658172607422
pred[(715, 1439)] = 242.13526916503906
-----------------------------------------------------------
ERA5[(716, 1439)] = 242.1960906982422
target[(716, 1439)] = 242.1960906982422
pred[(716, 1439)] = 242.1529998779297
-----------------------------------------------------------
ERA5[(717, 1439)] = 242.2283172607422
target[(717, 1439)] = 242.2283172607422
pred[(717, 1439)] = 242.14443969726562
-----------------------------------------------------------
ERA5[(718, 1439)] = 242.1931610107422
target[(718, 1439)] = 242.1931610107422
pred[(718, 1439)] = 242.07003784179688
-----------------------------------------------------------
ERA5[(719, 1439)] = 242.1423797607422
target[(719, 1439)] = 242.1423797607422
pred[(719, 1439)] = 242.07705688476562
-----------------------------------------------------------
ERA5[(720, 1439)] = 242.1277313232422
target[(720, 1439)] = 242.1277313232422
pred[(720, 1439)] = 242.1305694580078
-----------------------------------------------------------

you sure it's not a problem with your code when rebasing to the develop branch?

sbAsma commented 3 weeks ago

Update: the problem persists

I did the following:

created a new directory
cloned the branch develop

changed the following:

In evaluate.py:

model_id = '3qou60es'     # temperature (also 2147fkco)
mode, options = 'global_forecast', { 
                                  'dates' : [[2021, 2, 10, 12]] , 
                                  'token_overlap' : [0, 0],
                                  'forecast_num_tokens' : 2,
                                  'with_pytest' : False }

# file_path = '/gpfs/scratch/ehpc03/era5_y2010_2021_res025_chunk8.zarr'
file_path = '/p/scratch/atmo-rep/data/era5_1deg/months/era5_y2021_res025_chunk8.zarr' # Asma

In trainer,py:

cf.file_path = '/p/scratch/atmo-rep/data/era5_1deg/months/era5_y2010_2021_res025_chunk8.zarr' # Asma: added cuz couldn't find it in the list

I still have 0.0 values at index [:, 702:721, 1040:1440]

Any idea on what to explore next?

clessig commented 3 weeks ago

Can you please create a branch for this so that we can properly compare your changes to develop, and also more easily reproduce.

sbAsma commented 3 weeks ago

Ilaria suggested that I try again with iluise/head for a final check on whether the error is on my side or not.

If the error is only happening with develop and not iluise/head, I will create a branch for it.

sbAsma commented 3 weeks ago

global_forecast evaluation works perfectly in the code cloned from iluise/head. Will create a branch and push the code where it doesn't work and push it.

iluise commented 3 weeks ago

Hi asma, thanks for looking into this. No need to create anew branch and push the code where it does not work. we already have a PR open for iluise/head which will be soon merged into develop. https://github.com/clessig/atmorep/pull/37

I guess we can close this issue

clessig / atmorep

[BUG] null values at certain index points in target, and pred for `global_forecast` evaluation mode #47