Closed stephen-frank closed 1 year ago
Emailed data files to @JanghyunJK and @haneslinger. Here is where the data files are exported in my SkySpark code:
// Temporary: Persist data to files for troubleshooting prior to prep_for_rnn and predict
session
.pyExec("debug_data_csv = pathlib.Path(model_dir) / 'debug_data.csv'")
.pyExec("predictor_data_frame.to_csv(debug_data_csv, encoding='utf-8')")
.pyExec("debug_data_pickle = pathlib.Path(model_dir) / 'debug_data.pickle'")
.pyExec("predictor_data_frame.to_pickle(debug_data_pickle)")
// Prep data and run prediction
session
.pyExec("_, val_df = prep_for_rnn(configs, predictor_data_frame)") // Error occurs here
.pyExec("results = model.predict(val_df)")
@haneslinger and I determined this might stem from Pandas not being able to handle the Hxpy encoding of NA, hxpy.haystack.na.NA
. This may need to be converted to Pandas.NA
or some other more friendly "NA" data type. I'm going to take a crack at doing this on the SkySpark side first.
@haneslinger I added the following lines to SkySpark. Per testing in a smaller function, I believe they are correctly converting values of type hxpy.haystack.na.NA
to pandas.NA
.
// NA type conversion
session
.pyExec("notNA = predictor_data_frame.applymap(lambda v: not isinstance(v, hxpy.haystack.na.NA))")
.pyExec("predictor_data_frame = predictor_data_frame.where(notNA, pandas.NA)")
The syntax of where()
is a bit screwy; basically if the element-wise condition is TRUE it keeps the original value and if the condition is FALSE it replaces the original value with the alternate value specified.
After this test I still get the error. Running for 2023-05-17 specifically:
axon::EvalErr: Func failed: pyEval(PySession py,Str stmt); args: (PyMgrSession,Str)
sys::IOErr: Python failed: operands could not be broadcast together with shapes (121,78) (121,79)
Traceback (most recent call last):
File "/usr/src/app/hxpy/hxpy.py", line 67, in run
self._exec(instr, local_vars)
File "/usr/src/app/hxpy/hxpy.py", line 95, in _exec
return exec(code, local_vars, local_vars)
File "<string>", line 1, in <module>
File "/wattile/wattile/buildings_processing.py", line 512, in prep_for_rnn
data = _preprocess_data(configs, data)
File "/wattile/wattile/buildings_processing.py", line 486, in _preprocess_data
data = roll_data(data, configs)
File "/wattile/wattile/buildings_processing.py", line 571, in roll_data
means.loc[:, :] = sums.values / counts.values
ValueError: operands could not be broadcast together with shapes (121,78) (121,79)
[proj_wattile::wattilePythonModelPredict:155]
So it still seems the inclusion of the NA value messes up the calculation. Example data and configs sent via email.
Added NA type conversion to nrelWattileExt
in https://github.com/NREL/nrelWattileExt/pull/40. Issue is still present with NA type conversion in place per comment above.
@stephen-frank fixed as of FIX/281. Ready to close?
Yes; thanks.
Context:
FTLB_FTLBCHWMeterCHWEnergyRate_r1
,FTLB_FTLBHWMeterHWEnergyRate_r1
, andFTLB_FTLBMainRealPowerTotal_r1
Issue:
When I run predict for this date range, all (3) models, I get the following error (basically the same error for all models):
The most useful part of this error message are the last 4 lines, which show an off-by-one error in array dimensions between
sums.values
andcounts.values
and also point to these lines as the culprit: https://github.com/NREL/Wattile/blob/hpc_run/wattile/buildings_processing.py#L567-L571I traced this back to the (2) NA values for snow depth. If I modify my SkySpark function to scrub out NA values prior to passing data to Python, I do not get the error.
I think this may have something to do with how NA values are encoded in Pandas when passed from SkySpark to Python using hxpy? They may not be getting properly scrubbed as
NA
orNaN
. I have data dumps from Python in both CSV and Pickle formats immediately prior to calling the models. I will send those via email for troubleshooting.