NREL / buildstockbatch

Other
20 stars 13 forks source link

glue crawler is casting time variables as bigints instead of timestamps #411

Closed mdahlhausen closed 7 months ago

mdahlhausen commented 8 months ago

Describe the bug When the glue crawler is populating the athena tables, it is casting the timestamp as a bigint new

It used to cast it as a 'timestamp', which is more useful as we don't have to convert back. old

To Reproduce Steps to reproduce the behavior:

  1. Pick a recent .yml file.
  2. Run it

Expected behavior I'd like the glue crawler to case the time variables as timestamps, not bigints

Platform (please complete the following information):

Additional context Add any other context about the problem here.

asparke2 commented 8 months ago

Yeah, this is due to a change in Pandas 2.0. The raw timeseries data is now exported in Unix time, with nanosecond resolution. I the from_unixtime method in Presto to convert the data from Unix time to timestamp for the SightGlass. https://github.com/NREL/SightGlassDataProcessing/blob/main/telescope/glue.py#L331-L339

if col.name == 'timestamp':
    # Convert bigint to timestamp type if necessary
    if str(col.type) == 'BIGINT':
        # Pandas uses nanosecond resolution integer timestamps.
        # Presto expects second resolution values in from_unixtime.
        # Must divide values by 1e9 to go from nanoseconds to seconds.
        cols.append(sa.func.from_unixtime(col / 1e9).label('timestamp'))
    else:
        cols.append(col)

My guess is that to change this back to writing timestamps as strings, you'd have to change the code in this method: https://github.com/NREL/buildstockbatch/blob/develop/buildstockbatch/postprocessing.py#L259-L276

nmerket commented 7 months ago

I wonder if it would work correctly if you converted the dtypes to Arrow.