OpenGeoscience / nex

2 stars 4 forks source link

Fix timestamp in ETL for GDDP dataset #25

Closed aashish24 closed 8 years ago

kotfic commented 8 years ago

Turns out sparkSQL has very easy built in functions to deal with unix timestamps, so this is not really necessary.

aashish24 commented 8 years ago

This is great to know. Can you post an example here?

kotfic commented 8 years ago
sql = """
SELECT lat, lon, time, model, pr, tasmin, tasmax, MONTH(from_unixtime(time)) as month, YEAR(from_unixtime(time)) as year
FROM parquet
WHERE pr < 1.0E20 AND tasmin < 1.0E20 AND tasmax < 1.0E20
"""
df = sqlContext.sql(sql)

time is a column containing unix timestamps, we use the from_unixtime() builtin function and the MONTH() built in function to pull values from the timestamp