cmu-delphi / delphi-epidata

An open API for epidemiological data.
https://cmu-delphi.github.io/delphi-epidata/
MIT License
100 stars 68 forks source link

Consider partial live computation of metadata #793

Open sgratzl opened 2 years ago

sgratzl commented 2 years ago

see also https://github.com/cmu-delphi/www-covidcast/issues/1026#issuecomment-992668319

we should try out how long

select source, signal, max(time_value) from covidcast group by source, signal

takes.

if we wanna hit other indices we could generate a big union :

select source, signal, max(time_value) from covidcast where source = 'x' and signal = 'x'
UNION ALL
select source, signal, max(time_value) from covidcast where source = 'x' and signal = 'y'
...
krivard commented 2 years ago

TL;DR:

select source, signal, max(time_value) from covidcast group by source, signal

Cancelled when running time exceeded 2 hours.

select source, `signal`, max(time_value) from covidcast where source = 'chng' and `signal` = 'smoothed_adj_outpatient_cli'
UNION ALL
select source, `signal`, max(time_value) from covidcast where source = 'chng' and `signal` = 'smoothed_adj_outpatient_covid'
UNION ALL
...
UNION ALL
select source, `signal`, max(time_value) from covidcast where source = 'usa-facts' and `signal` = 'deaths_incidence_num'
UNION ALL
select source, `signal`, max(time_value) from covidcast where source = 'usa-facts' and `signal` = 'deaths_incidence_prop'

3 hours 36 minutes.

select source, `signal`, max(time_value) from covidcast where source = 'jhu-csse' and `signal` = 'confirmed_cumulative_num'

4 minutes.

sgratzl commented 2 years ago

oh that is unfortunate.

krivard commented 2 years ago

I'm going to push this into v4 -- once we have a more resilient schema for covidcast we may be able to handle queries like these.

melange396 commented 8 months ago

Nearly identical to the above 4 minute query, this one returns instantaneously:

SELECT MAX(time_value) FROM covid.epimetric_full WHERE signal_key_id=(
    SELECT signal_key_id FROM covid.signal_dim WHERE source='jhu-csse' AND `signal`='confirmed_cumulative_num' LIMIT 1
) AND time_type='day';

Note the inclusion of the constraint on time_type (of which there is only one for this signal), which lets the database make use of either of these indexes.