NCAR / intake-esm-datastore

Intake-esm Datastore
Apache License 2.0
14 stars 11 forks source link

Initial commit of catalog (complete) and json (incomplete) for NA-CORDEX #61

Closed sethmcg closed 3 years ago

bonnland commented 4 years ago

@andersy005 @sethmcg Just starting a conversation....Seth and I were unable to create some of the JSON aggregation details. We need a reminder of what the different criteria mean. The specification document (https://github.com/NCAR/esm-collection-spec/blob/master/collection-spec/collection-spec.md) is still a bit difficult to understand.

andersy005 commented 4 years ago

The specification document (NCAR/esm-collection-spec:collection-spec/collection-spec.md@master ) is still a bit difficult to understand.

I concur. The information on what should be in the aggregation_control is actually missing from the specification. I didn't notice this issue until a few weeks ago. I will update it in the coming days. I am around next week if people want to meet for a few minutes to iron out those details. Otherwise, I can answer any questions during the next hack session in two weeks.

bonnland commented 4 years ago

Hi @sethmcg I don't know if you're extra busy with other things or slightly blocked on the next steps. I'm happy to help. Probably the first immediate step is to commit and push the small syntax changes Anderson suggested using your cloned repository, on the branch associated with the pull request. The pull request itself will automatically update after your commit and push. We should eventually put together a small python test script that opens your catalog and prints out a summary. That would be a fine test for now.

In [1]: import intake

In [5]: path = "intake-esm-datastore/catalogs/glade-na-cordex.json"

In [6]: col = intake.open_esm_datastore(path)

The next test would be to search the catalog for specific things. For example, this LENS catalog search could be adapted to your catalog:

variables = ["TEMP", "UVEL", "VVEL", "WVEL", "VNS", "VNT"]
col_subset = col.search(variable=variables, experiment='CTRL')
col_subset
andersy005 commented 4 years ago

@sethmcg,

When you get a chance, can you add the script you are using to build the CSV file?

sethmcg commented 4 years ago

Both, at the moment. I'm hoping to get back to this later this week. Are you all set up for video conferencing? I may send you a calendar invite if I can't get some stuff figured out on my own...

Thanks,

--Seth

On 3/23/20 3:43 PM, bonnland wrote:

Hi @sethmcg https://github.com/sethmcg I don't know if you're extra busy with other things or slightly blocked on the next steps. I'm happy to help. Probably the first immediate step is to commit and push the small syntax changes Anderson suggested using your cloned repository, on the branch associated with the pull request. The pull request itself will automatically update after your commit and push. We should eventually put together a small python test script that opens your catalog and prints out a summary. That would be a fine test for now.

In [1]: import intake

In [5]: path = "intake-esm-datastore/catalogs/glade-na-cordex.json"

In [6]: col = intake.open_esm_datastore(path)```

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/intake-esm-datastore/pull/61#issuecomment-602872264, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5AZKI3ND5RWG6ILM6RPBDRI7JWVANCNFSM4K56Z36A.

sethmcg commented 4 years ago

Will do.

On 3/23/20 4:06 PM, Anderson Banihirwe wrote:

@sethmcg https://github.com/sethmcg,

When you get a chance, can you add the script you are using to build the |CSV| file?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/intake-esm-datastore/pull/61#issuecomment-602881710, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5AZKNELBXV6AVMQJF4LFLRI7MNHANCNFSM4K56Z36A.

bonnland commented 4 years ago

Are you all set up for video conferencing? I may send you a calendar invite if I can't get some stuff figured out on my own...

Google Hangouts have been working well for both Anderson and me. Feel free to set up a time.

bonnland commented 4 years ago

@sethmcg I've removed the "join_new" around the "common" attribute based on Anderson's advice. He says that the current intake spec does not support conditional joins on certain attributes, such as "common". An expert user of the na-cordex catalog, who knows that certain grids coincide and can be merged into a single datasets, can still do so on their own.

andersy005 commented 4 years ago

Thank you, @bonnland & @sethmcg. The PR is coming along nicely :). I tested it using https://github.com/NCAR/intake-esm/pull/194

In [1]: import intake

In [2]: url = "https://raw.githubusercontent.com/sethmcg/intake-esm-datastore/cordex/catalogs/glade-na-cordex.json"

In [3]: col = intake.open_esm_datastore(url)

In [4]: col
Out[4]: <Intake-esm catalog with 1291 dataset(s) from 14674 asset(s)>

In [5]: col.df.head()
Out[5]:
                                                path variable scenario      driver         rcm frequency     grid biascorrection  common
0  /glade/collections/cdg/data/cordex/data/raw/NA...     prec     hist  MPI-ESM-MR  CRCM5-UQAM       ann  NAM-22i            raw  common
1  /glade/collections/cdg/data/cordex/data/raw/NA...      vas     hist  MPI-ESM-MR  CRCM5-UQAM       ann  NAM-22i            raw  common
2  /glade/collections/cdg/data/cordex/data/raw/NA...  sfcWind     hist  MPI-ESM-MR  CRCM5-UQAM       ann  NAM-22i            raw  common
3  /glade/collections/cdg/data/cordex/data/raw/NA...      uas     hist  MPI-ESM-MR  CRCM5-UQAM       ann  NAM-22i            raw  common
4  /glade/collections/cdg/data/cordex/data/raw/NA...      tas     hist  MPI-ESM-MR  CRCM5-UQAM       ann  NAM-22i            raw  common

In [6]: col.keys()[:10]
Out[6]:
['eval.ERA-Int.CRCM5-OUR.ann.NAM-22.raw',
 'eval.ERA-Int.CRCM5-OUR.ann.NAM-22i.raw',
 'eval.ERA-Int.CRCM5-OUR.day.NAM-22.raw',
 'eval.ERA-Int.CRCM5-OUR.day.NAM-22i.raw',
 'eval.ERA-Int.CRCM5-OUR.mon.NAM-22.raw',
 'eval.ERA-Int.CRCM5-OUR.mon.NAM-22i.raw',
 'eval.ERA-Int.CRCM5-OUR.seas.NAM-22.raw',
 'eval.ERA-Int.CRCM5-OUR.seas.NAM-22i.raw',
 'eval.ERA-Int.CRCM5-OUR.ymon.NAM-22.raw',
 'eval.ERA-Int.CRCM5-OUR.ymon.NAM-22i.raw']

In [7]: col.unique(columns=['variable', 'scenario', 'driver', 'rcm', 'frequency', 'grid', 'biascorrection'])
Out[7]:
{'variable': {'count': 18,
  'values': ['hurs',
   'huss',
   'orog',
   'pr',
   'prec',
   'prhmax',
   'ps',
   'rsds',
   'sfcWind',
   'sftlf',
   'tas',
   'tasmax',
   'tasmin',
   'temp',
   'tmax',
   'tmin',
   'uas',
   'vas']},
 'scenario': {'count': 5,
  'values': ['eval', 'hist', 'rcp26', 'rcp45', 'rcp85']},
 'driver': {'count': 10,
  'values': ['CNRM-CM5',
   'CanESM2',
   'EC-EARTH',
   'ERA-Int',
   'GEMatm-Can',
   'GEMatm-MPI',
   'GFDL-ESM2M',
   'HadGEM2-ES',
   'MPI-ESM-LR',
   'MPI-ESM-MR']},
 'rcm': {'count': 7,
  'values': ['CRCM5-OUR',
   'CRCM5-UQAM',
   'CanRCM4',
   'HIRHAM5',
   'RCA4',
   'RegCM4',
   'WRF']},
 'frequency': {'count': 10,
  'values': ['1hr',
   '3hr',
   '6hr',
   'ann',
   'day',
   'fixed',
   'mon',
   'seas',
   'ymon',
   'yseas']},
 'grid': {'count': 5,
  'values': ['NAM-11', 'NAM-22', 'NAM-22i', 'NAM-44', 'NAM-44i']},
 'biascorrection': {'count': 3,
  'values': ['mbcn-Daymet-ns', 'mbcn-gridMET', 'raw']}}
bonnland commented 4 years ago

Nice, Seth! Gotta love those sed commands.

sethmcg commented 4 years ago

And tr! When I'm having trouble doing something concisely in sed, it's often because I should be using tr instead.