NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

Fix for ValueError #2653

Closed emilyhcliu closed 3 weeks ago

emilyhcliu commented 4 weeks ago

ValueError is found in NCO's parallel processing for TEMP at 20240531 12Z cycle and 20240602 12Z. 20240531 12Z failed at station id 47827 20240602 12Z failed at station id 48820

The error occurred while processing _genqsat due to an inconsistency in dimensions between temperature and moisture. The source of the issue is that the high-resolution data contains various scenarios of duplicates, and they are difficult to predict.

The solution proposed here is to remove duplicates in the temperature and moisture data frames, respectively, before merging them for saturation specific humidity calculation

Here is the error message from 20240531 12Z

Traceback (most recent call last):
  File "/lfs/h1/ops/para/packages/gfs.v16.3.16/ush/wdqms.py", line 770, in <module>
    WDQMS(args.input_list, args.type, args.outdir, args.loglevel)
  File "/lfs/h1/ops/para/packages/gfs.v16.3.16/ush/wdqms.py", line 80, in __init__
    tq_df = self._genqsat(tq_df)
  File "/lfs/h1/ops/para/packages/gfs.v16.3.16/ush/wdqms.py", line 517, in _genqsat
    q_df['Obs_Minus_Forecast_adjusted'] = bg_dep
  File "/apps/prod/python-modules/3.8.6/intel/19.1.3.304/lib/python3.8/site-packages/pandas/core/frame.py", line 3163, in __setitem__
    self._set_item(key, value)
  File "/apps/prod/python-modules/3.8.6/intel/19.1.3.304/lib/python3.8/site-packages/pandas/core/frame.py", line 3242, in _set_item
    value = self._sanitize_column(key, value)
  File "/apps/prod/python-modules/3.8.6/intel/19.1.3.304/lib/python3.8/site-packages/pandas/core/frame.py", line 3899, in _sanitize_column
    value = sanitize_index(value, self.index)
  File "/apps/prod/python-modules/3.8.6/intel/19.1.3.304/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 751, in sanitize_index
    raise ValueError(
ValueError: Length of values (91) does not match length of index (90)
398 + rc=1
398 + ((  rc != 0  ))

Resolves ValueError from Python due to duplicate observations in the saturation specific humidity calculation -->

Type of change

Change characteristics

How has this been tested?

Stand-alone WDQMS processing from 2024050800 to 2024060406 on both HERA and WCOSS-2

Checklist

emilyhcliu commented 4 weeks ago

@kevindougherty-noaa is double-checking the fix. I will activate this PR after I receive confirmation from Kevin.

emilyhcliu commented 4 weeks ago

@kevindougherty-noaa and @KateFriedman-NOAA I just run the updated code for the latestest cycles (2024060300 to 2024060406) that we have not tested yet. All cases passed successfully. I activated the PR and please reivew the code change. Thanks!

emilyhcliu commented 4 weeks ago

I will continue monitoring NCO parallel run with WDQMS.

KateFriedman-NOAA commented 3 weeks ago

@emilyhcliu I have just reviewed and approved. Are you ready for me to merge this and cut an updated tag to give to Simon?

emilyhcliu commented 3 weeks ago

@emilyhcliu I have just reviewed and approved. Are you ready for me to merge this and cut an updated tag to give to Simon? @KateFriedman-NOAA Yes, Please. Thank you.