NOAA-GFDL / CatalogBuilder

Toolset to build, validate, modify, and use intake-esm based data catalogs
2 stars 1 forks source link

add standard_name to catalog #19

Closed aradhakrishnanGFDL closed 1 month ago

aradhakrishnanGFDL commented 1 month ago

associated with issue https://github.com/NOAA-GFDL/CatalogBuilder/issues/2

addresses fast option. slow option is also coded in, but needs verifying, testing, etc.

how to test fast option to get this incorporated?


from GFDL ws-

conda activate catalogbuilder 
(or conda activate /nbhome/Aparna.Radhakrishnan/conda/envs/catalogbuilder)

go to your cloned repo's scripts. 

e.g 

cd /home/a1r/github/noaa-gfdl/CatalogBuilder/catalogbuilder/scripts

I have a test config in tests/config-cfname.yaml 
adjust output_path 

Run 
gen_intake_gfdl.py --config ../tests/config-cfname.yaml

(use from fre-cli to test as needed) 

expected output

The module intakebuilder is not installed. Do you have intakebuilder in your sys.path or have you activated the conda environment with the intakebuilder package in it? 
Attempting again with adjusted sys.path 
/home/a1r/github/noaa-gfdl/CatalogBuilder/catalogbuilder/intakebuilder/gfdlcrawler.py
No paths given, using yaml configuration
input_path : /archive/am5/am5/am5f7b10r0/c96L65_am5f7b10r0_amip/gfdl.ncrc5-deploy-prod-openmp/pp/
output_path : /home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip
headerlist : ['activity_id', 'institution_id', 'source_id', 'experiment_id', 'frequency', 'realm', 'table_id', 'member_id', 'grid_label', 'variable_id', 'time_range', 'chunk_freq', 'platform', 'dimensions', 'cell_methods', 'standard_name', 'path']
output_path_template : ['NA', 'NA', 'source_id', 'NA', 'experiment_id', 'platform', 'custom_pp', 'realm', 'cell_methods', 'frequency', 'chunk_freq']
output_file_template : ['realm', 'time_range', 'variable_id']
Missing cols from metadata sources: ['activity_id', 'institution_id', 'table_id', 'member_id', 'grid_label', 'dimensions', 'standard_name']
Found existing file! Overwrite? (y/n)y
writing..
/home/a1r/github/noaa-gfdl/CatalogBuilder/catalogbuilder/scripts/gen_intake_gfdl.py:117: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['standard_name'].loc[(df['variable_id'] == k)] = v
JSON generated at: /home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip.json
CSV generated at: /home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip.csv