Closed mdahlhausen closed 7 months ago
Yeah, this is due to a change in Pandas 2.0. The raw timeseries data is now exported in Unix time, with nanosecond resolution. I the from_unixtime
method in Presto to convert the data from Unix time to timestamp for the SightGlass. https://github.com/NREL/SightGlassDataProcessing/blob/main/telescope/glue.py#L331-L339
if col.name == 'timestamp':
# Convert bigint to timestamp type if necessary
if str(col.type) == 'BIGINT':
# Pandas uses nanosecond resolution integer timestamps.
# Presto expects second resolution values in from_unixtime.
# Must divide values by 1e9 to go from nanoseconds to seconds.
cols.append(sa.func.from_unixtime(col / 1e9).label('timestamp'))
else:
cols.append(col)
My guess is that to change this back to writing timestamps as strings, you'd have to change the code in this method: https://github.com/NREL/buildstockbatch/blob/develop/buildstockbatch/postprocessing.py#L259-L276
I wonder if it would work correctly if you converted the dtypes to Arrow.
Describe the bug When the glue crawler is populating the athena tables, it is casting the timestamp as a![new](https://github.com/NREL/buildstockbatch/assets/2249271/0b63ca97-c93f-4c81-a197-402584bcdb7e)
bigint
It used to cast it as a 'timestamp', which is more useful as we don't have to convert back.![old](https://github.com/NREL/buildstockbatch/assets/2249271/fe72b3dc-14a6-474a-bc85-77db5a1c4036)
To Reproduce Steps to reproduce the behavior:
Expected behavior I'd like the glue crawler to case the time variables as timestamps, not bigints
Platform (please complete the following information):
Additional context Add any other context about the problem here.