EL-BID / UrbanTrips

An open-source library to process smart card payment data, infer destination and get meaningful KPI
https://el-bid.github.io/UrbanTrips/
Other
24 stars 2 forks source link

Improve geocoding performance #123

Closed alephcero closed 6 months ago

alephcero commented 8 months ago

When goecoding, add dia as a join key and to indexes

    if configs['lineas_contienen_ramales']:
        query = """
            WITH trx AS (
            select t.id,t.id_original, t.id_tarjeta,
                    datetime(t.fecha, 'unixepoch') as fecha,
                    t.dia,t.tiempo,t.hora, t.modo, t.id_linea,
                    t.id_ramal, t.interno, t.orden as orden,
                    g.latitud, g.longitud,
                    (t.fecha - g.fecha) / 60 as delta_trx_gps_min,
                    t.factor_expansion,
                ROW_NUMBER() OVER(
                    PARTITION BY t."id"
                    ORDER BY g.fecha DESC) AS n_row
            from trx_eco t, gps g
            where t."dia" = g."dia"   
            and t."id_linea" = g."id_linea"
            and  t."id_ramal" = g."id_ramal"
            and  t."interno" = g."interno"
            and t.fecha > g.fecha
            )
            SELECT *
            FROM trx
            WHERE n_row = 1;
        """
    else:
        query = """
            WITH trx AS (
            select t.id,t.id_original, t.id_tarjeta,
                    datetime(t.fecha, 'unixepoch') as fecha,
                    t.dia,t.tiempo,t.hora, t.modo, t.id_linea,
                    t.interno, t.orden as orden, g.latitud, g.longitud,
                    (t.fecha - g.fecha) / 60 as delta_trx_gps_min,
                    t.factor_expansion,
                ROW_NUMBER() OVER(
                    PARTITION BY t."id"
                    ORDER BY g.fecha DESC) AS n_row
            from trx_eco t, gps g
            where t."dia" = g."dia"   
            and t."id_linea" = g."id_linea"
            and  t."interno" = g."interno"
            and t.fecha > g.fecha
            )
            SELECT *
            FROM trx
            WHERE n_row = 1;
        """

    conn_data.execute(
        """
            CREATE INDEX IF NOT EXISTS trx_idx_r ON trx_eco (
                "dia","id_linea","id_ramal","interno","fecha"
                );
            """
    )

    conn_data.execute(
        """
            CREATE INDEX  IF NOT EXISTS gps_idx_r ON gps (
                "dia","id_linea","id_ramal","interno","fecha"
                );
        """
    )

    conn_data.execute(
        """
            CREATE INDEX IF NOT EXISTS trx_idx_l ON trx_eco (
                "dia","id_linea","interno","fecha"
                );
            """
    )

    conn_data.execute(
        """
            CREATE INDEX  IF NOT EXISTS gps_idx_l ON gps (
                "dia","id_linea","interno","fecha"
                );
        """
    )
alephcero commented 8 months ago

Delete records on trx_eco instead of droping table

alephcero commented 8 months ago

Check geocoding with gps bigger than threshold in configs

alephcero commented 6 months ago

Closed by https://github.com/EL-BID/UrbanTrips/commit/9f71889db9ef6565fb0adf633869273a7e7985f8