NOAA-GFDL / MDTF-diagnostics

Analysis framework and collection of process-oriented diagnostics for weather and climate simulations
https://mdtf-diagnostics.readthedocs.io/en/main/
Other
61 stars 98 forks source link

Forcing feedback POD issue #662

Open aradhakrishnanGFDL opened 2 months ago

aradhakrishnanGFDL commented 2 months ago

New issues running forcing feedback POD. Changed enddate to 198401 to get rid of another error with 198012.

New catalog with standard_name fixed is here at GFDL /home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip30_0814.json

"case_list": { "atmos_cmip": { "model": "am5", "convention": "CMIP", "startdate": "198001", "enddate": "198412" } },

Querying /proj/wkdir/c96L65_am5f7b10r0_amip30_0814.json for variable ts for case atmos_cmip.
WARNING: /opt/conda/envs/_MDTF_base/lib/python3.12/site-packages/intake_esm/_search.py:50: UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
  mask = df[column].str.contains(value, regex=True, case=True, flags=0)

CRITICAL: **********************************************************************
Uncaught exception:
Traceback (most recent call last):
  File "/proj/MDTF-diagnostics/mdtf_framework.py", line 243, in <module>
    exit_code = main(prog_name='MDTF-diagnostics')
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/_MDTF_base/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/_MDTF_base/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/_MDTF_base/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/_MDTF_base/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/_MDTF_base/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/proj/MDTF-diagnostics/mdtf_framework.py", line 199, in main
    cat_subset = data_pp.process(cases, ctx.config, model_paths.MODEL_WORK_DIR)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/proj/MDTF-diagnostics/src/preprocessor.py", line 1314, in process
    cat_subset = self.query_catalog(case_list, config.DATA_CATALOG)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/proj/MDTF-diagnostics/src/preprocessor.py", line 993, in query_catalog
    date_range_dict = {f: cat_subset_df[f].attrs[range_attr_string]
                          ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'intake_esm_attrs:date_range'
wrongkindofdoctor commented 2 months ago

@aradhakrishnanGFDL @jtmims has done some exploring and found that the json file for your catalog still has time_range in the aggregations block, which is likely causing your problem. Try removing it and see if that gets you further.

aradhakrishnanGFDL commented 2 months ago

@aradhakrishnanGFDL @jtmims has done some exploring and found that the json file for your catalog still has time_range in the aggregations block, which is likely causing your problem. Try removing it and see if that gets you further.

The groupby_attrs do not have time_range. /home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip30_0814.json Right? @jtmims are you referring to groupby_attrs or the joinExisting, because I expect the latter to be present to perform the aggregation across time.

 "aggregation_control": {
    "variable_column_name": "variable_id",
    "groupby_attrs": [
      "source_id",
      "experiment_id",
      "frequency",
      "table_id",
      "grid_label",
      "realm",
      "member_id",
      "chunk_freq"
    ],
jtmims commented 2 months ago

From testing, either/or can be the solution. If you keep the join_existing in the aggregations you have to also include time_range in 'groupby_attrs' in order for intake_esm to grab that attribute and make it accessible in xarray. In the current version of the framework, I believe @wrongkindofdoctor has removed 'time_range' from join_existing portion of the aggregations. Both of the options seem to make intake_esm happy to populate these values.

aradhakrishnanGFDL commented 2 months ago

From testing, either/or can be the solution. If you keep the join_existing in the aggregations you have to also include time_range in 'groupby_attrs' in order for intake_esm to grab that attribute and make it accessible in xarray. In the current version of the framework, I believe @wrongkindofdoctor has removed 'time_range' from join_existing portion of the aggregations. Both of the options seem to make intake_esm happy to populate these values.

intake-esm and xarray can work with time in join_existing and not in groupby_attrs. Example POD and the corresponding catalogs are an example. Let me know if I missed something. I notice time_range removed from join_existing and "date" added.

We can settle for a temporary fix where I edit the json schema from my end, but we want to use these catalogs outside of MDTF in our user analysis scripts, so we can revisit this later and see what is sustainable fix.

aradhakrishnanGFDL commented 2 months ago

@jtmims Going for the fix from my end, let me know if this catalog looks fine. I removed the join_existing time_range part, at least I think I did.

/home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip30_0814.json

aradhakrishnanGFDL commented 2 months ago

The json with removed join_existing didn't do it. But, let me know if it worked for you and I missed something.

I am also curious if this issue could be something to do with no data preprocessing done as the log message says ..

POD convention and data convention are both cmip. No data translation will be performed for case atmos_cmip. Could it be related to that and date_range is not being set and skipped here or else-where in preprocessor with this edge-case as you didn't come across for MJO PODs?

The other possibility is that perhaps my input json (runtime) do not have the conventions set correctly?

jtmims commented 2 months ago

@aradhakrishnanGFDL the issue occurs during the check to make sure the files are in the correct order to run xr.concat(). I have just sent a PR (#665) that changes this sort to not rely on information supplied by esm intake. Hopefully, this fixes the issue!

aradhakrishnanGFDL commented 2 months ago

that's progress, thanks! Went further along.. the logs do not have additional info. But I can send pointers on GFDL's dpdev when you need. The errors are the same regardless of the join_existing, so that's ruled out.

'/proj/wkdir/MDTF_output.v17/MDTF_atmos_cmip_198001_198412/mon/atmos_cmip.rlutcs.mon.nc'.
Getting list of assets...

Successfully wrote ESM catalog json file to: file:///proj/wkdir/MDTF_output.v17/MDTF_postprocessed_data.json
SubprocessRuntimeManager: run <#1BUm:forcing_feedback>.
### Starting <#1BUm:forcing_feedback>
<#1BUm:forcing_feedback> will run using 'python' from conda env '_MDTF_python3_base'.
    Running forcing_feedback.py for <#1BUm:forcing_feedback>.
<#1BUm:forcing_feedback> exited without specifying a return code
 (not necessarily a failure; this information just wasn't provided
to the subprocess manager when the POD completed).

Checking linked output files for <#1BUm:forcing_feedback>.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_globemean_Rad.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_globemean_IRF.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_globemean_LWFB.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_globemean_SWFB.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_CMIP6scatter.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_Temperature.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_WaterVapor.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_SfcAlbedo.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_Cloud.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_Rad.png'.
ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_IRF.png'.
ERROR: Deactivated <#1BUm:forcing_feedback> due to MDTFFileNotFoundError("[Errno 2] No such file or directory: 'Missing 11 files.'").
wrongkindofdoctor commented 2 months ago

@aradhakrishnanGFDL The POD is not generating figures for some reason. I don't see an issue with the figure directory paths at first glance. There is nothing in the forcing_feedback/forcing_feedback.log file, correct?

aradhakrishnanGFDL commented 2 months ago

@aradhakrishnanGFDL The POD is not generating figures for some reason. I don't see an issue with the figure directory paths at first glance. There is nothing in the forcing_feedback/forcing_feedback.log file, correct?

Oh, there is a hint there about kernel file missing. What is that? The rest of the log, looks the same as the stdout with the error pasted earlier.

### Start execution of <#5tzt:forcing_feedback>
--------------------------------------------------------------------------------
CONDA_EXE=/bin/micromamba
_CONDA_EXE=/bin/micromamba
_CONDA_ROOT=/opt/conda
Found program python3.
Kernel file is missing. POD will not work!
--------------------------------------------------------------------------------
<#5tzt:forcing_feedback> exited without specifying a return code
 (not necessarily a failure; this information just wasn't provided
to the subprocess manager when the POD completed).

Log for <#5tzt:forcing_feedback>:
    *** caught exception (#1):
    None: None
aradhakrishnanGFDL commented 2 months ago

I think I got it, the OBS data is missing https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/7cda1dd8eeb79ed1fd0156767fc6f54e627141e2/diagnostics/forcing_feedback/forcing_feedback.py#L55.

@jtmims I am using your obs_data directory. Is forcing feedback POD's obs data present there? Perhaps I should use the oar.gfdl.mdtf obs data path?

aradhakrishnanGFDL commented 2 months ago

I will substitute with /home/oar.gfdl.mdtf/mdtf/inputdata/obs_data/forcing_feedback/ and report back

jtmims commented 2 months ago

@aradhakrishnanGFDL I was just about to notify you of this. I've been using the oar.gfdl.mdtf obs data path as of now. It saves space on my directory. Some calculations done in Forcing Feedback require a file found in OBS_DATA.

wrongkindofdoctor commented 2 months ago

@aradhakrishnanGFDL @jtmims I ran the FF POD with @aradhakrishnanGFDL's test catalog and obs data. There are some issues with the plotting routines in the POD related to the data itself and the color bar routines shown in the forcing_feedback.log file. You can try re-running with a different date range, but otherwise, this is a POD-specific issue that does not appear to stem from the framework preprocessing.

/diagnostics/forcing_feedback/forcing_feedback_util.py:58: RuntimeWarning: Mean of empty slice
  var_base_tmean = np.repeat(np.squeeze(np.nanmean(var_base_re, axis=0))[np.newaxis, :, :, :], \
/local/home/Jessica.Liptak/mdtf/MDTF-diagnostics/diagnostics/forcing_feedback/forcing_feedback_util.py:202: RuntimeWarni
ng: Mean of empty slice
  var_base_tot_m_tmean = np.squeeze(np.nanmean(var_base_tot_re[:, m, :, :], axis=0))
/local/home/Jessica.Liptak/mdtf/MDTF-diagnostics/diagnostics/forcing_feedback/forcing_feedback_util.py:417: RuntimeWarni
ng: Mean of empty slice
  fluxanom_timemean = np.nanmean(fluxanom, axis=0)
/local/home/Jessica.Liptak/mdtf/MDTF-diagnostics/diagnostics/forcing_feedback/forcing_feedback_plot.py:139: SyntaxWarnin
g: invalid escape sequence '\D'
  xterms = ['$\Delta{R}_{tot}$', '$\lambda_{cloud}$', '$\lambda_{noncloud}$']
/local/home/Jessica.Liptak/mdtf/MDTF-diagnostics/diagnostics/forcing_feedback/forcing_feedback_plot.py:139: SyntaxWarnin
g: invalid escape sequence '\l'
  xterms = ['$\Delta{R}_{tot}$', '$\lambda_{cloud}$', '$\lambda_{noncloud}$']
/local/home/Jessica.Liptak/mdtf/MDTF-diagnostics/diagnostics/forcing_feedback/forcing_feedback_plot.py:139: SyntaxWarnin
g: invalid escape sequence '\l'
  xterms = ['$\Delta{R}_{tot}$', '$\lambda_{cloud}$', '$\lambda_{noncloud}$']
The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.
/local/home/Jessica.Liptak/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/cartopy/mpl/gridliner.py:475:
 UserWarning: The .ylabels_left attribute is deprecated. Please use .left_labels to toggle visibility instead.
  warnings.warn('The .ylabels_left attribute is deprecated. Please '
/local/home/Jessica.Liptak/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/cartopy/mpl/gridliner.py:475:
 UserWarning: The .ylabels_left attribute is deprecated. Please use .left_labels to toggle visibility instead.
  warnings.warn('The .ylabels_left attribute is deprecated. Please '
/local/home/Jessica.Liptak/miniconda3/envs/_MDTF_python3_base/lib/python3.12/site-packages/cartopy/mpl/gridliner.py:475:
 UserWarning: The .ylabels_left attribute is deprecated. Please use .left_labels to toggle visibility instead.
  warnings.warn('The .ylabels_left attribute is deprecated. Please '
Traceback (most recent call last):
  File "/local/home/Jessica.Liptak/mdtf/MDTF-diagnostics/diagnostics/forcing_feedback/forcing_feedback_plot.py", line 19
2, in <module>
    map_plotting_4subs(levels_1, levels_2, variablename_1, modelvariable_1, lon_originalmodel,
  File "/local/home/Jessica.Liptak/mdtf/MDTF-diagnostics/diagnostics/forcing_feedback/forcing_feedback_util.py", line 49
3, in map_plotting_4subs
    if not np.all(cbar_levs1 == cbar_levs2):
                  ^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: operands could not be broadcast together with shapes (13,) (11,) 
Working Directory is /local/home/Jessica.Liptak/mdtf/wkdir/MDTF_output/forcing_feedback
Forcing Feedback POD is executing
Generating Forcing Feedback POD plots
Last log message by Forcing Feedback POD: finished executing
jtmims commented 2 months ago

@wrongkindofdoctor What data set was used? It seems to gel well with the AM5 that I had on the workstation. It plotted all of the plots when running with the latest main branch.

aradhakrishnanGFDL commented 2 months ago

@wrongkindofdoctor What data set was used? It seems to gel well with the AM5 that I had on the workstation. It plotted all of the plots when running with the latest main branch.

@jtmims Glad your run succeeded! yay. Looking at @wrongkindofdoctor's comment on the time period selections and referring to the POD documentation, it appears that 2003-2014 has to be used for this POD, which is also the same selections you have in your config file. If I bypass my disk space issues, that's what I will do next, and see if it succeeds.

AM5 dev experiment - /archive/am5/am5/am5f7b10r0/c96L65_am5f7b10r0_amip

wrongkindofdoctor commented 2 months ago

@aradhakrishnanGFDL @jtmims and I have been discussing the possibility that the container may not be able to mount correctly to archive and/or the way archive files are stored prohibits accessing them properly. We advise transferring a subset of the files to your workstation (one of the /local/ or /net directories should have enough space) and building a catalog referencing that data directory to avoid file read issues.

aradhakrishnanGFDL commented 2 months ago

@aradhakrishnanGFDL @jtmims and I have been discussing the possibility that the container may not be able to mount correctly to archive and/or the way archive files are stored prohibits accessing them properly. We advise transferring a subset of the files to your workstation (one of the /local/ or /net directories should have enough space) and building a catalog referencing that data directory to avoid file read issues.

The storage issue is on a VM that has different file systems. /archive mount works just like any other file system mount, but good to verify this! The approach suggested is quite cumbersome for a user running analysis on catalogs generated from modeling workflows. But I will keep this in mind..I hadn't thought about this, I will hope for some success with the current setup.

jtmims commented 2 months ago

@aradhakrishnanGFDL Okay, so I just ran ffb for the years 2003-2014 using the original catalog you sent in this thread (/home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip30_0814.json). It has completed the run and plotted everything it needed to. Please keep me updated if you find anything out about the container volume mounting!

aradhakrishnanGFDL commented 2 months ago

Summarizing an offline thread:

My catalog points to this /archive/am5/am5/am5f7b10r0/c96L65_am5f7b10r0_amip/gfdl.ncrc5-deploy-prod-openmp/pp. @jtmims's catalog points to the /local/home subset of relevant data from /archive/am5/am5/am5f7b10r0/c96L65_am5f7b10r0_amip/gfdl.ncrc5-deploy-prod-openmp/pp.

This catalog /home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip30_0814.json or /home/a1r/testing/runtime_config_am5test_jmims.jsonc (this uses the same experiment @jtmims uses, but the data in /local is not exactly what's reflected in archive) can be used in testing.

I have tested the behavior with and without the container, and the following plots are missing in both. (But there are several other plots (and all netcdf output) generated - so does not look like a container issue or an /archive mount/access issue)

ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_WaterVapor.png'. ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_SfcAlbedo.png'. ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_Cloud.png'. ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_Rad.png'. ERROR: Missing '$WORK_DIR/forcing_feedback/model/forcing_feedback_maps_IRF.png'.

Reference - I used this for testing without container: MDTF-diagnostics repo (/home/Jacob.Mims/mdtf/MDTF-diagnostics) and the central installation for conda envs (as the central installation points to v4.0 tag and not latest main, as intended perhaps).

So, what might be different? catalog or its the state of the source data? forcing POD logs do not provide additional info from my runs.

It sounds like reaching out to the POD developer makes sense to me right now, if another person can use the my config or catalog and replicate results.

Let this issue stay open.

aradhakrishnanGFDL commented 2 months ago

@jtmims any update on this?

wrongkindofdoctor commented 2 months ago

@aradhakrishnanGFDL Offline discussion and review suggests that this an issue with the POD computation not outputting results that it can plot within whatever bounds that the POD has set for the plot routine with this specific data subset. I suggest reaching out to the POD developers for more information. The framework team has other development tasks to prioritize this week, and will revisit the issue when time allows. In the mean time, I suggest trying to run the container with the Wheeler-Kiladis or EOF500_hpa PODs that have been consistently working in our tests, or selecting a different POD to try if all of the required variables are present in the dataset.

aradhakrishnanGFDL commented 2 months ago

@aradhakrishnanGFDL Offline discussion and review suggests that this an issue with the POD computation not outputting results that it can plot within whatever bounds that the POD has set for the plot routine with this specific data subset. I suggest reaching out to the POD developers for more information. The framework team has other development tasks to prioritize this week, and will revisit the issue when time allows. In the mean time, I suggest trying to run the container with the Wheeler-Kiladis or EOF500_hpa PODs that have been consistently working in our tests, or selecting a different POD to try if all of the required variables are present in the dataset.

I will look into testing with an alternate POD. Note that these tests are without container. Let's talk about the project priorities this week, to ensure everyone's on the same page.