google-research / arco-era5

Recipes for reproducing Analysis-Ready & Cloud Optimized (ARCO) ERA5 datasets.
https://cloud.google.com/storage/docs/public-datasets/era5
Apache License 2.0
287 stars 22 forks source link

Data is flipped over latitude axis for select dates #70

Closed Arcomano1234 closed 3 months ago

Arcomano1234 commented 5 months ago

It seems that for a select few dates the snow_depth variable is flipped (e.g., the latitudes are reversed) from this dataset gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/. I only checked values in 6 hour chunks from 1979 to 2021, but I found the fields are flipped for these dates:

Flipped data at 1981-03-16T00:00:00.000000000
Flipped data at 1981-03-16T06:00:00.000000000
Flipped data at 1981-03-16T12:00:00.000000000
Flipped data at 1981-03-16T18:00:00.000000000
Flipped data at 1982-04-06T00:00:00.000000000
Flipped data at 1982-04-06T06:00:00.000000000
Flipped data at 1982-04-06T12:00:00.000000000
Flipped data at 1982-04-06T18:00:00.000000000
Flipped data at 1985-12-11T00:00:00.000000000
Flipped data at 1985-12-11T06:00:00.000000000
Flipped data at 1985-12-11T12:00:00.000000000
Flipped data at 1985-12-11T18:00:00.000000000
Flipped data at 1987-11-30T00:00:00.000000000
Flipped data at 1987-11-30T06:00:00.000000000
Flipped data at 1987-11-30T12:00:00.000000000
Flipped data at 1987-11-30T18:00:00.000000000
Flipped data at 1990-03-05T00:00:00.000000000
Flipped data at 1990-03-05T06:00:00.000000000
Flipped data at 1990-03-05T12:00:00.000000000
Flipped data at 1990-03-05T18:00:00.000000000
Flipped data at 1990-04-02T00:00:00.000000000
Flipped data at 1990-04-02T06:00:00.000000000
Flipped data at 1990-04-02T12:00:00.000000000
Flipped data at 1990-04-02T18:00:00.000000000
Flipped data at 1990-08-12T00:00:00.000000000
Flipped data at 1990-08-12T06:00:00.000000000
Flipped data at 1990-08-12T12:00:00.000000000
Flipped data at 1990-08-12T18:00:00.000000000
Flipped data at 1997-05-15T00:00:00.000000000
Flipped data at 1997-05-15T06:00:00.000000000
Flipped data at 1997-05-15T12:00:00.000000000
Flipped data at 1997-05-15T18:00:00.000000000
Flipped data at 2002-03-17T00:00:00.000000000
Flipped data at 2002-03-17T06:00:00.000000000
Flipped data at 2002-03-17T12:00:00.000000000
Flipped data at 2002-03-17T18:00:00.000000000
Flipped data at 2003-11-26T00:00:00.000000000
Flipped data at 2003-11-26T06:00:00.000000000
Flipped data at 2003-11-26T12:00:00.000000000
Flipped data at 2003-11-26T18:00:00.000000000
Flipped data at 2004-02-10T00:00:00.000000000
Flipped data at 2004-02-10T06:00:00.000000000
Flipped data at 2004-02-10T12:00:00.000000000
Flipped data at 2004-02-10T18:00:00.000000000
Flipped data at 2006-04-12T00:00:00.000000000
Flipped data at 2006-04-12T06:00:00.000000000
Flipped data at 2006-04-12T12:00:00.000000000
Flipped data at 2006-04-12T18:00:00.000000000
Flipped data at 2007-06-19T00:00:00.000000000
Flipped data at 2007-06-19T06:00:00.000000000
Flipped data at 2007-06-19T12:00:00.000000000
Flipped data at 2007-06-19T18:00:00.000000000
Flipped data at 2009-03-05T00:00:00.000000000
Flipped data at 2009-03-05T06:00:00.000000000
Flipped data at 2009-03-05T12:00:00.000000000
Flipped data at 2009-03-05T18:00:00.000000000
Flipped data at 2013-11-11T00:00:00.000000000
Flipped data at 2013-11-11T06:00:00.000000000
Flipped data at 2013-11-11T12:00:00.000000000
Flipped data at 2013-11-11T18:00:00.000000000
Flipped data at 2014-05-11T00:00:00.000000000
Flipped data at 2014-05-11T06:00:00.000000000
Flipped data at 2014-05-11T12:00:00.000000000
Flipped data at 2014-05-11T18:00:00.000000000
Flipped data at 2017-03-17T00:00:00.000000000
Flipped data at 2017-03-17T06:00:00.000000000
Flipped data at 2017-03-17T12:00:00.000000000
Flipped data at 2017-03-17T18:00:00.000000000
Flipped data at 2020-05-19T00:00:00.000000000
Flipped data at 2020-05-19T06:00:00.000000000
Flipped data at 2020-05-19T12:00:00.000000000
Flipped data at 2020-05-19T18:00:00.000000000
dabhicusp commented 5 months ago

Hi @Arcomano1234, it would be helpful if you could provide the example code here. When I checked the code, I couldn't find any latitude values for the snow_depth variable for the given year that is flipped. I found in the below example that the value for the 1980 & 1981 is the same.

import xarray as xr
ar_full_37_1h = xr.open_zarr('gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/')[['snow_depth']]
print(ar_full_37_1h.sel(time='1981-03-16T00:00:00.000000000').latitude.values)
print(ar_full_37_1h.sel(time='1980-03-16T00:00:00.000000000').latitude.values)
Arcomano1234 commented 5 months ago

Hi thank you for your quick response! After further investigating it I realized the problem is more subtle than realized. The latitudes themselves are not flipped but the data itself is flipped. Here is the minimum reproducible code to this bug. The code below produces a matplotlib imshow image for the two dates in your example code (I also attached those images below). During these "flipped" dates I mentioned in the previous comment the outline of Antarctica and Greenland are in the incorrect locations (e.g., flipped over the equator).

import xarray as xr
import matplotlib.pyplot as plt

ar_full_37_1h = xr.open_zarr('gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/')[['snow_depth']]

correct_orientation = ar_full_37_1h.sel(time='1980-03-16T00:00:00.000000000')['snow_depth'].values
incorrect_orientation = ar_full_37_1h.sel(time='1981-03-16T00:00:00.000000000')['snow_depth'].values

plt.imshow(correct_orientation)
plt.title('Correctly Orientated Data Example')
plt.show()

plt.imshow(incorrect_orientation)
plt.title('Flipped Data Example')
plt.show()
Screenshot 2024-03-27 at 10 38 06 AM Screenshot 2024-03-27 at 10 39 13 AM
dabhicusp commented 5 months ago

Hello @Arcomano1234 If possible please share the script with us which you used to generate the flipped data or run your script on all of the year(1940-2023) so we can fix all the flipped data of the dataset.

Arcomano1234 commented 5 months ago

I admit this is not the best method for detecting the flipped data but this is the quickest and most simple. I assumed that the top latitude band (e.g., the North Pole) had no snow depth after inspecting the data. So the script checks any time there is snow depth where there shouldn't be. This script was able to detect all of the problem data from 1979 - 2023, however, I can not vary if it works for older data.

import numpy as np
import xarray as xr

years = np.arange(1979,2022)
for year in years:
    ds = xr.open_dataset(f'{year}.nc')
    times = ds.time
    for i in range(len(times)):
        var = ds['snow_depth'][i,0,:].values

        if np.mean(var) > 0.1:
            print('Flipped data at',times[i].values)
shoyer commented 4 months ago

https://github.com/google-research/arco-era5/issues/71 is a duplicate of this issue. See here for my code to reproduce: https://github.com/google-research/arco-era5/issues/71#issuecomment-2106227810

It appears that every variable has this issue on these dates.

Arcomano1234 commented 4 months ago

Thank you for the update. I just checked and yes all of the variables I have used from ARCO-ERA5 dataset are flipped at the dates mentioned in my original post.

dabhicusp commented 3 months ago

@Arcomano1234 @shoyer we were able to identify the root cause of the issue. The discrepancy arises from the fact that certain variables have a resolution of 0.5 * 0.5 degrees, whereas all the other variables have a resolution of 0.25 * 0.25 degrees. Consequently, during the creation of the dataset, the latitude value is reversed, resulting in reversed data for that particular date.

The following variables have a spatial resolution of 0.5 * 0.5 degrees for the specified date.

'1965-11-22' -> wave_spectral_kurtosis
'1981-03-16' -> mean_wave_period_based_on_first_moment_for_swell
'1982-04-06' -> benjamin_feir_index
'1985-12-11' -> benjamin_feir_index
'1987-11-30' -> mean_wave_period_of_second_swell_partition
'1990-03-05' -> mean_direction_of_wind_waves
'1990-04-02' -> period_corresponding_to_maximum_individual_wave_height
'1990-08-12' -> mean_period_of_total_swell
'1997-05-15' -> significant_wave_height_of_third_swell_partition
'2002-03-17' -> peak_wave_period
'2003-11-26' -> mean_direction_of_wind_waves
'2004-02-10' -> mean_wave_direction_of_first_swell_partition
'2006-04-12' -> mean_wave_period_based_on_first_moment_for_swell
'2007-06-19' -> v_component_stokes_drift
'2009-03-05' -> wave_spectral_directional_width_for_wind_waves
'2013-11-11' -> mean_wave_period_of_third_swell_partition
'2014-05-11' -> significant_wave_height_of_third_swell_partition
'2017-03-17' -> mean_wave_period
'2020-05-19' -> benjamin_feir_index

P.S.: I already updated the .nc file for the above date for the above variables so you can't see the old files now. 😄

Here is an example code snippet for creating the datasets:

def fun():
  year, month, day = 1981, 3, 16
  root_path = pathlib.Path("gs://gcp-public-data-arco-era5/raw") # GCP path
  output = {}
  for variable in ['mean_wave_period_based_on_first_moment_for_swell', "mean_wave_period_based_on_first_moment"]:
      relative_path = SINGLE_LEVEL_SUBDIR_TEMPLATE.format(year=year, month=month, day=day, variable=variable)
      output[variable] = _read_nc_dataset(root_path / relative_path)
      print(output[variable])
  print("-------------------------")
  final_answer = xr.Dataset(output)
  print("final answer : ", final_answer)

P.P.S.: It should be noted that the .nc files have already been updated, so using the above code will yield accurate results (data is not flipped).

Furthermore, we will update the data of the zarr file (gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/) within the next 2-3 days.

dabhicusp commented 3 months ago

@Arcomano1234 @shoyer I updated data of the zarr file(gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3/). After a thorough examination, I have concluded that there are no flipped data points; therefore, I am closing this issue.

Please feel free to reopen the issue if you find any flipped data in the future.

shoyer commented 3 months ago

Thanks @dabhicusp, really appreciated!