glamod / glamod-ingest

Database preparation and ingestion for GLAMOD
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

cdms-lens: optimise the temporal selection into a single SQL query #23

Closed agstephens closed 4 years ago

agstephens commented 4 years ago

Implement this:

  1. If "period=True":
    • take start and end and use between operation
    • Must also specify the partition - so that it is faster.
  2. Else:
    • For month, day or hour, need iterate to get a long list of values to compare against.
    • Must also specify the partition - so that it is faster.
    if 'period' in query or 'date' in query:
        assert not ('period' in query and 'date' in query)
        k = 'period' if 'period' in query else 'date'
        t0, t1 = query[k].split('/')
        t1 = datetime.datetime.strptime(t1, '%Y-%m-%d').astimezone(UTC)
        t0 = datetime.datetime.strptime(t0, '%Y-%m-%d').astimezone(UTC) + datetime.timedelta(days=1)
        time_condition = "report_timestamp between '{}'::timestamptz and '{}'::timestamptz".format(t0, t1)
    else:
        year = query['year'] if isinstance(query['year'], list) else [query['year']]
        year.sort()
        month = query['month'] if isinstance(query['month'], list) else [query['month']]
        month.sort()
        day = query['day'] if isinstance(query['day'], list) else [query['day']]
        day.sort()
        all_days = []
        for x in itertools.product(year, month, day):
            try:
                all_days.append(datetime.datetime.strptime('{}-{}-{}'.format(*x), '%Y-%m-%d').astimezone(UTC))
            except:
                pass
        time_condition = "date_trunc('day', report_timestamp) in ({})".format(', '.join(["'{}'::timestamptz".format(x) for x in all_days]))
agstephens commented 4 years ago

Alternative approaches:

  1. Add extra fields:
    • datetime truncated to the month
    • datetime truncated to the day
    • datetime truncated to the hour

And index each of those fields.

  1. Could put those extra fields in views rather than in the table.
agstephens commented 4 years ago

Another alternative approach:

  1. Make the "super query".
  2. Sub-select within those results.
agstephens commented 4 years ago

First version now implemented. Testing, testing, testing...

agstephens commented 4 years ago

Seems to be working okay - so closing.