Closed sethmcg closed 3 years ago
The specification document (NCAR/esm-collection-spec:collection-spec/collection-spec.md@master ) is still a bit difficult to understand.
I concur. The information on what should be in the aggregation_control is actually missing from the specification. I didn't notice this issue until a few weeks ago. I will update it in the coming days. I am around next week if people want to meet for a few minutes to iron out those details. Otherwise, I can answer any questions during the next hack session in two weeks.
Hi @sethmcg I don't know if you're extra busy with other things or slightly blocked on the next steps. I'm happy to help. Probably the first immediate step is to commit and push the small syntax changes Anderson suggested using your cloned repository, on the branch associated with the pull request. The pull request itself will automatically update after your commit and push. We should eventually put together a small python test script that opens your catalog and prints out a summary. That would be a fine test for now.
In [1]: import intake
In [5]: path = "intake-esm-datastore/catalogs/glade-na-cordex.json"
In [6]: col = intake.open_esm_datastore(path)
The next test would be to search the catalog for specific things. For example, this LENS catalog search could be adapted to your catalog:
variables = ["TEMP", "UVEL", "VVEL", "WVEL", "VNS", "VNT"]
col_subset = col.search(variable=variables, experiment='CTRL')
col_subset
@sethmcg,
When you get a chance, can you add the script you are using to build the CSV
file?
Both, at the moment. I'm hoping to get back to this later this week. Are you all set up for video conferencing? I may send you a calendar invite if I can't get some stuff figured out on my own...
Thanks,
--Seth
On 3/23/20 3:43 PM, bonnland wrote:
Hi @sethmcg https://github.com/sethmcg I don't know if you're extra busy with other things or slightly blocked on the next steps. I'm happy to help. Probably the first immediate step is to commit and push the small syntax changes Anderson suggested using your cloned repository, on the branch associated with the pull request. The pull request itself will automatically update after your commit and push. We should eventually put together a small python test script that opens your catalog and prints out a summary. That would be a fine test for now.
In [1]: import intake
In [5]: path = "intake-esm-datastore/catalogs/glade-na-cordex.json"
In [6]: col = intake.open_esm_datastore(path)```
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/intake-esm-datastore/pull/61#issuecomment-602872264, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5AZKI3ND5RWG6ILM6RPBDRI7JWVANCNFSM4K56Z36A.
Will do.
On 3/23/20 4:06 PM, Anderson Banihirwe wrote:
@sethmcg https://github.com/sethmcg,
When you get a chance, can you add the script you are using to build the |CSV| file?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/intake-esm-datastore/pull/61#issuecomment-602881710, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5AZKNELBXV6AVMQJF4LFLRI7MNHANCNFSM4K56Z36A.
Are you all set up for video conferencing? I may send you a calendar invite if I can't get some stuff figured out on my own...
Google Hangouts have been working well for both Anderson and me. Feel free to set up a time.
@sethmcg I've removed the "join_new" around the "common" attribute based on Anderson's advice. He says that the current intake spec does not support conditional joins on certain attributes, such as "common". An expert user of the na-cordex catalog, who knows that certain grids coincide and can be merged into a single datasets, can still do so on their own.
Thank you, @bonnland & @sethmcg. The PR is coming along nicely :). I tested it using https://github.com/NCAR/intake-esm/pull/194
In [1]: import intake
In [2]: url = "https://raw.githubusercontent.com/sethmcg/intake-esm-datastore/cordex/catalogs/glade-na-cordex.json"
In [3]: col = intake.open_esm_datastore(url)
In [4]: col
Out[4]: <Intake-esm catalog with 1291 dataset(s) from 14674 asset(s)>
In [5]: col.df.head()
Out[5]:
path variable scenario driver rcm frequency grid biascorrection common
0 /glade/collections/cdg/data/cordex/data/raw/NA... prec hist MPI-ESM-MR CRCM5-UQAM ann NAM-22i raw common
1 /glade/collections/cdg/data/cordex/data/raw/NA... vas hist MPI-ESM-MR CRCM5-UQAM ann NAM-22i raw common
2 /glade/collections/cdg/data/cordex/data/raw/NA... sfcWind hist MPI-ESM-MR CRCM5-UQAM ann NAM-22i raw common
3 /glade/collections/cdg/data/cordex/data/raw/NA... uas hist MPI-ESM-MR CRCM5-UQAM ann NAM-22i raw common
4 /glade/collections/cdg/data/cordex/data/raw/NA... tas hist MPI-ESM-MR CRCM5-UQAM ann NAM-22i raw common
In [6]: col.keys()[:10]
Out[6]:
['eval.ERA-Int.CRCM5-OUR.ann.NAM-22.raw',
'eval.ERA-Int.CRCM5-OUR.ann.NAM-22i.raw',
'eval.ERA-Int.CRCM5-OUR.day.NAM-22.raw',
'eval.ERA-Int.CRCM5-OUR.day.NAM-22i.raw',
'eval.ERA-Int.CRCM5-OUR.mon.NAM-22.raw',
'eval.ERA-Int.CRCM5-OUR.mon.NAM-22i.raw',
'eval.ERA-Int.CRCM5-OUR.seas.NAM-22.raw',
'eval.ERA-Int.CRCM5-OUR.seas.NAM-22i.raw',
'eval.ERA-Int.CRCM5-OUR.ymon.NAM-22.raw',
'eval.ERA-Int.CRCM5-OUR.ymon.NAM-22i.raw']
In [7]: col.unique(columns=['variable', 'scenario', 'driver', 'rcm', 'frequency', 'grid', 'biascorrection'])
Out[7]:
{'variable': {'count': 18,
'values': ['hurs',
'huss',
'orog',
'pr',
'prec',
'prhmax',
'ps',
'rsds',
'sfcWind',
'sftlf',
'tas',
'tasmax',
'tasmin',
'temp',
'tmax',
'tmin',
'uas',
'vas']},
'scenario': {'count': 5,
'values': ['eval', 'hist', 'rcp26', 'rcp45', 'rcp85']},
'driver': {'count': 10,
'values': ['CNRM-CM5',
'CanESM2',
'EC-EARTH',
'ERA-Int',
'GEMatm-Can',
'GEMatm-MPI',
'GFDL-ESM2M',
'HadGEM2-ES',
'MPI-ESM-LR',
'MPI-ESM-MR']},
'rcm': {'count': 7,
'values': ['CRCM5-OUR',
'CRCM5-UQAM',
'CanRCM4',
'HIRHAM5',
'RCA4',
'RegCM4',
'WRF']},
'frequency': {'count': 10,
'values': ['1hr',
'3hr',
'6hr',
'ann',
'day',
'fixed',
'mon',
'seas',
'ymon',
'yseas']},
'grid': {'count': 5,
'values': ['NAM-11', 'NAM-22', 'NAM-22i', 'NAM-44', 'NAM-44i']},
'biascorrection': {'count': 3,
'values': ['mbcn-Daymet-ns', 'mbcn-gridMET', 'raw']}}
Nice, Seth! Gotta love those sed commands.
And tr! When I'm having trouble doing something concisely in sed, it's often because I should be using tr instead.
@andersy005 @sethmcg Just starting a conversation....Seth and I were unable to create some of the JSON aggregation details. We need a reminder of what the different criteria mean. The specification document (https://github.com/NCAR/esm-collection-spec/blob/master/collection-spec/collection-spec.md) is still a bit difficult to understand.