NCAR / ldcpy

Statistical and visual tools for gathering metrics and comparing Earth System Model data files. A common use case is comparing data that has been lossily compressed with the original data.
https://ldcpy.readthedocs.io/
Apache License 2.0
19 stars 2 forks source link

More support for pop data #240

Closed allibco closed 3 years ago

allibco commented 3 years ago

This fixes a number of issues for POP data. Now we are very close to having everything working!

@andersy005 - could you help me with one bit that is broken? It is demonstrated in the last cell in the POPData notebook. Basically the issue is when I grab POP data with a fixed lat and lon, then the TLAT and TLON coords no longer have dimensions (nlat, nlon). This happens in the subset_data() in util.py - I added a comment at the end of that function. I spent several hours on Friday trying to figure out how to add the dimensions back to these coordinates, but to no avail :). Any help is appreciated. (Note that the CAM data is unaffected as the lat and lon are one dimensional, so I can grab them independently with isel(). But for POP I had to figure out which grid point was closet from the 2D lat and lon data)

andersy005 commented 3 years ago

is happens in the subset_data() in util.py - I added a comment at the end of that function. I spent several hours on Friday trying to figure out how to add the dimensions back to these coordinates,

I don't know to address this issue for (2D) dimension coordinates unfortunately. Ccing @dcherian who may have ideas/recommendations

dcherian commented 3 years ago

I can't tell; can you add a simple example please?

allibco commented 3 years ago

I can't tell; can you add a simple example please?

Yes - I will make simple example tomorrow - thanks for your help.

allibco commented 3 years ago

@dcherian Here is my simple example:

import numpy as np
import pandas as pd
import xarray as xr
mylat = np.array([[1., 2., 3., 4.], [1.1, 2.1, 3.1, 3.9], [1, 1.9, 2.9, 3.9]])
mylon = np.array([[3., 4., 5., 6.], [3.1, 4.1, 5.1, 5.9], [3, 3.9, 4.9, 5.9]])
mytimes = pd.date_range('2000-01-01', periods=5)
pop_data = np.random.uniform(low=0.0, high=100.0, size=(3,4,5))
ds_subset = xr.DataArray(
    pop_data, 
     coords={
        "time": mytimes,
        "TLAT": (("nlat", "nlon"), mylat, {'standard_name': 'latitude', 'units': 'degrees_north'}),
        "TLON": (("nlat", "nlon"), mylon, {'standard_name': 'longitude', 'units': 'degrees_east'})
    },
    dims=['nlat', 'nlon', 'time'] , attrs={'long_name': 'Surface Potential'}
)
ds_subset

pop0

So in the above, the coords TLAT and TLON have dims (nlat, nlon). Now I do some calculations and want a the data just a a single grid point:

dd = ds_subset.cf['latitude'].dims
lat_dim_name = dd[0]
lon_dim_name = dd[1]
xmin = 1
ymin = 0
ds_singlept = ds_subset.isel({lat_dim_name: xmin, lon_dim_name: ymin})
ds_singlept

pop

Now the dims (nlat, nlon) are missing from the TLAT and TLON coordinates. My question is how can I put them back (I need them for a further calculation).

dcherian commented 3 years ago

Use ds_singlept = ds_subset.isel({lat_dim_name: [xmin], lon_dim_name:[ymin]})

The list preserves the dimension.

allibco commented 3 years ago

Thanks so much. I can't tell you how long I spent trying to figure that out....

dcherian commented 3 years ago

Thanks so much. I can't tell you how long I spent trying to figure that out....

:((((( it isn't even documented, we need some kind of indexing cheatsheet.

allibco commented 3 years ago

I think I have all the POP modifications accounted for now except for adding testing specific for pop data and fixing the contrast variance to work with pop (issues #230 and #231). - I'll tackle these soon :)