Open claudia-codeco opened 4 months ago
I suggest creating a view on the database like this (untested code):
CREATE VIEW casos as SELECT
count(classi_fin) FILTER(WHERE classi_fin IS NOT NULL) AS casos_suspeitos,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND classi_fin <> '') AS casos_provaveis,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND criterio = 1) as casos_lab
FROM "Municipio"."Notificacao"
WHERE
dt_digita <= {lastday} AND dt_digita >={firstday} AND municipio_geocodigo IN ({sqlcity}) AND cid10_codigo IN({sqlcid});
@claudia-codeco, is this the query you need?
almost there. we need to group_by epiweek using dt_sin_pri
It seems that first creating this aggregation by municipio_geocódigo
and epiweek
as a separate view, named say weekly_cases_by_city
will simplify the query above which can then select from this view, instead of "Municipio"."Notificação"
.
@luabida can you finalize this, please?
Yes, the final table should have the following columns
municipio_geocodigo year week (epiweek) (1 to 53) casos casos_provaveis casos_lab
So it seems that the view has to be something like this:
CREATE VIEW casos as SELECT
municipio_geocodigo,
year,
SE,
count(classi_fin) FILTER(WHERE classi_fin IS NOT NULL) AS casos_suspeitos,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND classi_fin <> '') AS casos_provaveis,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND criterio = 1) as casos_lab
FROM "Municipio"."Notificacao"
WHERE
dt_digita <= {lastday} AND dt_digita >={firstday} AND municipio_geocodigo IN ({sqlcity}) AND cid10_codigo IN({sqlcid})
GROUP by municipio_geocodigo, year, SE;
@fccoelho @claudia-codeco this is the result from the query with minor adjustments in the column names:
SELECT
municipio_geocodigo,
ano_notif,
se_notif,
count(classi_fin) FILTER(WHERE classi_fin IS NOT NULL) AS casos_suspeitos,
count(classi_fin) FILTER(WHERE classi_fin <> 5) AS casos_provaveis,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND criterio = 1) as casos_lab
FROM "Municipio"."Notificacao"
WHERE
dt_digita BETWEEN '2023-01-01' AND '2023-01-03'
AND municipio_geocodigo IN (3304557)
AND cid10_codigo IN ('A90')
GROUP by municipio_geocodigo, ano_notif, se_notif;
The only thing I didn't quite understand is the purpose of having the column epiweek
in the Copernicus table, but the table on production has just completed the UPDATE in every row with the epidemiological week:
dengue=# select date, epiweek, geocodigo from weather.copernicus_brasil limit 1;
date | epiweek | geocodigo
------------+---------+-----------
2016-12-13 | 201650 | 2209955
(1 row)
There is only one issue: we cannot use se_notif
for the group by we need to use the epiweek of dt_sinpri
.
So unless we find a way to calculate the epiweek using SQL, we may need to call a pl-python function in this view.
The definition of EpiWeek is this (according to PAHO):
The epidemiological week begins on Sunday and ends on Saturday. The first epidemiological week of the year ends on the first Saturday of January, provided that it falls at least four or more days into the month. Therefore, the first epidemiological week may actually begin in December of the previous year.
Adding the epiweek in copernicus table is important to allow for grouping by epiweek.
I tried to keep the exact logic as the epiweek
does, the system parameter is not quite necessary, but it may be portable to windows if necessary too
SQL Function with plpython3u:
CREATE OR REPLACE FUNCTION extract_SE(date DATE)
RETURNS INT AS $$
from datetime import date as dt
def _system_adjustment(system: str) -> int:
systems = ("iso", "cdc") # Monday, Sunday
return systems.index(system.lower())
def _year_start(year: int, system: str) -> int:
adjustment = _system_adjustment(system)
mid_weekday = 3 - adjustment # Sun is 6 .. Mon is 0
jan1 = dt(year, 1, 1)
jan1_ordinal = jan1.toordinal()
jan1_weekday = jan1.weekday()
week1_start_ordinal = jan1_ordinal - jan1_weekday - adjustment
if jan1_weekday > mid_weekday:
week1_start_ordinal += 7
return week1_start_ordinal
def fromdate(date: dt, system: str = "cdc") -> int:
if isinstance(date, str):
date = dt.fromisoformat(date)
year = date.year
date_ordinal = date.toordinal()
year_start_ordinal = _year_start(year, system)
week = (date_ordinal - year_start_ordinal) // 7
if week < 0:
year -= 1
year_start_ordinal = _year_start(year, system)
week = (date_ordinal - year_start_ordinal) // 7
elif week >= 52:
year_start_ordinal = _year_start(year + 1, system)
if date_ordinal >= year_start_ordinal:
year += 1
week = 0
week += 1
return int(str(year) + f"{week:02d}")
return fromdate(date, "cdc")
$$ LANGUAGE plpython3u;
Updated query:
SELECT
municipio_geocodigo,
ano_notif,
extract_SE(dt_sin_pri) as SE,
count(classi_fin) FILTER(WHERE classi_fin IS NOT NULL) AS casos_suspeitos,
count(classi_fin) FILTER(WHERE classi_fin <> 5) AS casos_provaveis,
count(classi_fin) FILTER(WHERE classi_fin <> 5 AND criterio = 1) as casos_lab
FROM "Municipio"."Notificacao"
WHERE
dt_digita BETWEEN '2023-01-01' AND '2023-01-03'
AND municipio_geocodigo IN (3304557)
AND cid10_codigo IN ('A90')
GROUP by municipio_geocodigo, ano_notif, extract_SE(dt_sin_pri);
Result:
We should add this python function to the setup script for the infodengue database so that we always have access to it. in case of a redeploy somewhere.
Problem
Currently, line list of cases are queried and then aggregated. In order to include group_by in the sql query, we need to compute the epiweek.
Why do that
To optimize consultation, this is also useful for the webpage
current code (AlertTools::getCases.R)