NCAR / intake-esm-datastore

Intake-esm Datastore
Apache License 2.0
14 stars 11 forks source link

SubX catalog #46

Closed aaronspring closed 5 years ago

aaronspring commented 5 years ago

Would it be useful and feasible to build a catalog for SubX ?

Or would it be more useful/easy to just build a catalog based on intake-xarray? With SubX the data will be remote.

https://github.com/kpegion/SubX/blob/master/Python/download_data/generate_ts_py_ens_files.ksh

andersy005 commented 5 years ago

With SubX the data will be remote.

Is the data hosted on an OpenDAP server?

andersy005 commented 5 years ago

Is the data hosted on an OpenDAP server?

I just took a look at the script above, and it appears to be reading the data remotely via xarray+openDAP.

Would it be useful and feasible to build a catalog for SubX ?

Is it feasible? Yes, this is doable today. There was an issue about this on intake-esm issue tracker: https://github.com/NCAR/intake-esm/issues/175

andersy005 commented 5 years ago

Here's an example of catalog pointing to an OpenDAP server: http://haden.ldeo.columbia.edu/catalogs/hyrax_cmip6.json

aaronspring commented 5 years ago

Great. I will give this a try tomorrow. The json file looks like cmip6 data and hopefully the structure can get copied a bit.

Any ideas on running a builder there?

aaronspring commented 5 years ago

I dont get to the individual nc files: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.NCEP/.CFSv2/.forecast/.pr/dods where I cannot look further. Anyone an idea?

import xarray as xr
url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.RSMAS/.CCSM4/.hindcast/.zg/dods'
remote_data = xr.open_dataarray(url, chunks={'S': 1, 'L': 1})

source: https://stackoverflow.com/questions/50240123/xarray-mean-of-data-stored-via-opendap

the subX output is already concated together in a useful form with dims (S, L, M, X, Y). sure having model included there would be nice, but the datasets are very heavy. I guess a more simple intake-xarray yaml file also does it fine to start with.

aaronspring commented 5 years ago

plugins:
  source:
    - module: intake_xarray
sources:
  subX:
    description: SubX
    driver: opendap
    metadata:
      url_origin: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/
    #cache:
    #  - argkey: urlpath
    #    regex: ''
    #    type: file
    parameters:
      model:
        description: model
        type: str
        default: NCEP
        allowed: [CESM, ECCC, EMC, ESRL, GMAO, NCEP, NRL, RSMAS]
      subdataset:
        description: subdataset
        type: str
        default: 30LCESM1
        allowed: [
          30LCESM1, 46LCESM1, # CESM
          GEM, GEPS5, GEPS6, #ECCC
          GEFS, #EMC
          FIMr1p1, #ESRL
          GEOS_V2p1, # GMAO
          NESM, #NRL
          CCSM4, #RSNAS
        ]
      cast:
        description: hindcast or forecast
        type: str
        allowed: [hindcast, forecast]
      variable:
        description: variable name
        type: str
        default: ts
        allowed: [ts, zg, va, ua, tas, rlut, pr, hfls, hfss, huss, mrso, psl, rad, ROMI, snc, stx, sty, swe, tasmax, tasmin, uas, vas, wap]
    args:
      urlpath: http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{{model}}/.{{subdataset}}/.{{cast}}/.{variable}/dods
      chunks: {'S': 1, 'L': 1}