NREL / buildstockbatch

Other
22 stars 14 forks source link

CI issues with newer version of pandas and existing parquet files in repo #385

Closed nmerket closed 1 year ago

nmerket commented 1 year ago

Describe the bug

The CI is returning errors on all runs in this test for python > 3.8. The old parquet files store an object datatype while the newer ones have a python[string] datatype.

To Reproduce Steps to reproduce the behavior:

  1. Happens on any CI run.

Expected behavior

Tests pass

Logs

From the CI logs:

        # results parquet
        test_pq = pd.read_parquet(os.path.join(test_path, 'baseline', 'results_up00.parquet')).sort_values('building_id')\
            .reset_index().drop(columns=['index'])
        reference_pq = pd.read_parquet(os.path.join(reference_path, 'baseline', 'results_up00.parquet'))\
            .sort_values('building_id').reset_index().drop(columns=['index'])
>       pd.testing.assert_frame_equal(test_pq, reference_pq)
E       AssertionError: Attributes of DataFrame.iloc[:, 4] (column name="completed_status") are different
E       
E       Attribute "dtype" are different
E       [left]:  string[python]
E       [right]: object

Platform (please complete the following information):

Additional context

Two ideas for how to address this:

  1. (easy but could break again) Open the testing parquet files in the repo in a newer version of pandas, convert the columns to string and save them back. This should solve the error, but something like this may happen again.
  2. (harder but more maintainable in the long run) Change these tests to instead of comparing an expected parquet to a generated one, use the newer integration test framework where you can actually run ResStock and generate results. Then you'd check that those results have expected columns and such without comparing two dataframes directly.