datamade / wopr-data

Deprecated: Scripts for creating and updating datasets for plenario
MIT License
2 stars 2 forks source link

Store timestamp in master table #5

Open evz opened 10 years ago

evz commented 10 years ago

@svetlozarn I'm curious why you're storing a NULL in the obs_ts field for the crime data (here). I realize you're storing that Orig_Date info into the obs_date field but since the schema declares that as a DATE column, it's not saving the time portion of that info. Is it safe to just save the Orig_Date into the obs_ts field as well? Or were you thinking of using that in a different way?

svetlozarn commented 10 years ago

@evz That's a good point. I separated DATE and TIMESTAMP in order to handle diverse datasets but we can revisit the integration scheme.

On Thu, Jan 9, 2014 at 10:36 AM, Eric van Zanten notifications@github.comwrote:

@svetlozarn https://github.com/svetlozarn I'm curious why you're storing a NULL in the obs_ts field for the crime data (herehttps://github.com/datamade/wopr-data/blob/master/init/init_chicago-crimes-all.sql#L245). I realize you're storing that Orig_Date info into the obs_date field but since the schema declares that as a DATE column, it's not saving the time portion of that info. Is it safe to just save the Orig_Date into the obs_ts field as well? Or were you thinking of using that in a different way?

— Reply to this email directly or view it on GitHubhttps://github.com/datamade/wopr-data/issues/5 .

evz commented 10 years ago

I made a new branch here that basically takes the same value that gets inserted for the obs-date field and inserts it into the obs_ts field for the crime data (since that's the only one where we have greater than a day resolution)

evz commented 10 years ago

@svetlozarn I actually figured a way to run a temporal aggregate query in a pretty flexible way using the postgresql date_trunc function (more on that over here). It looks like it should support all of the temporal resolutions that we're after by doing something like this:

SELECT 
    date_trunc('month', obs_date) as time_agg, 
    dataset_name, count(*) 
    FROM dat_master 
    WHERE obs_date > '2013-08-01' 
    GROUP BY time_agg, dataset_name 
    ORDER BY time_agg;

Replace the month part in there with anything that the date_trunc function supports (day, year, minute, second, etc) that we have instant temporal aggregation. Does that sound sensible? One thing that this kind of approach would rely upon is having the date/time info in a consistent field and not in two different fields.