fsspec / kerchunk

Cloud-friendly access to archival data
https://fsspec.github.io/kerchunk/
MIT License
302 stars 78 forks source link

Create coordinate from dimension? #346

Open kthyng opened 1 year ago

kthyng commented 1 year ago

Hi! I've been trying to convert an existing dimension in my ROMS model output file (e.g. eta_rho) into a coordinate when generating the MultiZarr output. This would allow me to add an attribute to the dimension/coordinate such that it would be properly identified by cf_xarray. I've been messing around with preprocesing and postprocessing to do this and haven't figured it out (yet) but I also keep feeling like it's probably possible using coo_map. I've tried with the coo_map approach a few ways and haven't had any luck that way either. It looks something like

mzz = MultiZarrToZarr(
    json_list,
    concat_dims=["ocean_time"],
    identical_dims= ['lat_rho', 'lon_rho', "lon_psi", "lat_psi",
                    "lat_u", "lon_u", "lat_v", "lon_v", 
                    "Akk_bak","Akp_bak","Akt_bak","Akv_bak","Cs_r","Cs_w",
                    "FSobc_in","FSobc_out","Falpha","Fbeta","Fgamma","Lm2CLM",
                    "Lm3CLM", "LnudgeM2CLM", "LnudgeM3CLM", "LnudgeTCLM",
                    "LsshCLM", "LtracerCLM", "LtracerSrc", "LuvSrc",
                    "LwSrc", "M2nudg", "M2obc_in", "M2obc_out", "M3nudg",
                    "M3obc_in", "M3obc_out", "Tcline", "Tnudg","Tobc_in", "Tobc_out",
                    "Vstretching", "Vtransform", "Znudg", "Zob", "Zos", "angle",
                    "dstart", "dt", "dtfast", "el", "f", "gamma2", "grid", "h",
                    "hc", "mask_psi", "mask_rho", "mask_u", "mask_v", "nHIS", "nRST",
                    "nSTA", "ndefHIS",  "ndtfast", "ntimes", "pm", "pn", "rdrg", 
                    "rdrg2", "rho0", "spherical", "theta_b", "theta_s", "xl",
                    ],
    coo_map = {"ocean_time": "cf:ocean_time",
            #    "eta_rho": list(np.arange(1044))  # tried this
            #    'eta_rho': "data:eta_rho",  # tried this too
               },
    # preprocess=preprocess,
    # postprocess=postprocess,
)

is there an easy way to do this or should I pursue my pre/post processing approach? Thanks.

martindurant commented 1 year ago

Sorry this one passed me by. The "data:" variant expects that each input has the correct value(s) of the coordinate already and implies you are concat-ing along that direction. The list variant assigns that value to each of the inputs for concatenation, which would be odd when you are also concating along another dimension.

If you put more details about the input datasets and what you would like the output to be like, I may be able to help. To just "promote" a regular variable that concats correctly to a coordinate, it's fine to put in processing to set the attributes as needed, but it might be worthwhile investigating why that didn't already happen in the first place.