billyc / imove-australia-simwrapper

SimWrapper typescript code for iMOVE Australia
1 stars 1 forks source link

PostgreSQL API #7

Closed resuly closed 4 months ago

resuly commented 4 months ago

Todo list:

billyc commented 4 months ago

@resuly - some progress! but also some problems, we are not quite finished yet:

I created the function for fetching trips that cross through a point, thank you for sending the SQL query that you used. I modified it to make it work with PostgREST as follows:

CREATE OR REPLACE FUNCTION public.trips_at_location(lon FLOAT, lat FLOAT, radius FLOAT)

RETURNS TABLE (
    trip_id TEXT
) AS $$

SELECT
    DISTINCT trip_id
FROM
    public.points
WHERE
    timestamp BETWEEN '2022-04-01 06:00:00' AND '2022-04-02 12:00:00'
    AND ST_DWithin(
            point_snapped,
            ST_SetSRID(ST_MakePoint(lon, lat), 4326),
            radius / (111320 * cos(radians(lat)))
        )
$$ LANGUAGE SQL IMMUTABLE;

So the two API endpoints are now fulling functioning! They are:

I modified the code to try to fetch data from these endpoints, and it almost works except that the code expects the trips table to have some columns which are missing:

I need both of those (in one form or other!) to be able to plot the traces on the map and to calculate the speed statistics for the selected subpath.

I pushed the latest code to the repo but it will not function yet until this is resolved.

Also note, the API caps at 1000 returned rows. I still need to update code to cycle through row ranges if count > 1000

resuly commented 4 months ago

In the original dataset, each row represents a trip that contains sequential points. Path1, Speed_path are attributes (raw location, and speed values) of all points within the trip. For lat and lon used for visualization, we should use point_snapped instead of Path1 (renamed to point_raw in the points table) because the former one is the processed data that remove the GPS noise and shifting issues.

In the database, we use points table to store all points from all trips and make trip_id as a foreign key to link the trips table. After getting the trip_id, we can query points table again using trip_id to get the full points information.

resuly commented 4 months ago

I've made several improvements to address your concerns and enhance the functionality of our API. Here's a summary of the changes:

  1. Increased Timeout Duration

    • Added db-pool-acquisition-timeout = 300 to imove.conf
    • This extends the wait time to 300 seconds, allowing for longer-running queries
  2. Revised trips_at_location function in database

    create function trips_at_location(lon double precision, lat double precision, radius double precision, start_time timestamp without time zone DEFAULT '2022-04-01 06:00:00'::timestamp without time zone, end_time timestamp without time zone DEFAULT '2022-04-02 12:00:00'::timestamp without time zone)
    returns TABLE(trip_id bigint, points jsonb)
    stable
    language sql
    as
    $$
    WITH matching_trips AS (
        SELECT DISTINCT pts.trip_id
        FROM public.points pts
        WHERE pts."timestamp" BETWEEN start_time AND end_time
            AND ST_DWithin(
                pts.point_snapped,
                ST_SetSRID(ST_MakePoint(lon, lat), 4326),
                radius / (111320 * cos(radians(lat)))
            )
    )
    SELECT
        pts.trip_id,
        jsonb_agg(
            jsonb_build_object(
                'point_id', pts.id,
                'point_snapped', ST_AsGeoJSON(pts.point_snapped)::jsonb,
                'point_timestamp', pts."timestamp",
                'speed', pts.speed,
                'x_acc', pts.x_acc,
                'y_acc', pts.y_acc,
                'gyro_roll', pts.gyro_roll,
                'gyro_pitch', pts.gyro_pitch,
                'gyro_yaw', pts.gyro_yaw
            )
            ORDER BY pts."timestamp"
        ) AS points
    FROM public.points pts
    JOIN matching_trips mt ON pts.trip_id = mt.trip_id
    GROUP BY pts.trip_id;
    $$;

    Example API call:

    • /imove-api/rpc/trips_at_location?lon=144.96261&lat=-37.81211&radius=10
  3. New trips_at_ab function in database. It returns the trips passing through A to B.

    create function trips_at_ab(lon_a double precision, lat_a double precision, lon_b double precision, lat_b double precision, radius double precision, start_time timestamp without time zone DEFAULT '2022-04-01 06:00:00'::timestamp without time zone, end_time timestamp without time zone DEFAULT '2022-04-02 12:00:00'::timestamp without time zone)
    returns TABLE(trip_id bigint, points jsonb)
    stable
    language sql
    as
    $$
    WITH points_at_a AS (
        SELECT DISTINCT trip_id
        FROM public.points
        WHERE "timestamp" BETWEEN start_time AND end_time
            AND ST_DWithin(
                point_snapped,
                ST_SetSRID(ST_MakePoint(lon_a, lat_a), 4326),
                radius / (111320 * cos(radians(lat_a)))
            )
    ),
    points_at_b AS (
        SELECT DISTINCT trip_id
        FROM public.points
        WHERE "timestamp" BETWEEN start_time AND end_time
            AND ST_DWithin(
                point_snapped,
                ST_SetSRID(ST_MakePoint(lon_b, lat_b), 4326),
                radius / (111320 * cos(radians(lat_b)))
            )
    ),
    matching_trips AS (
        SELECT trip_id
        FROM points_at_a
        INTERSECT
        SELECT trip_id
        FROM points_at_b
    )
    SELECT
        pts.trip_id,
        jsonb_agg(
            jsonb_build_object(
                'point_id', pts.id,
                'point_snapped', ST_AsGeoJSON(pts.point_snapped)::jsonb,
                'point_timestamp', pts."timestamp",
                'speed', pts.speed,
                'x_acc', pts.x_acc,
                'y_acc', pts.y_acc,
                'gyro_roll', pts.gyro_roll,
                'gyro_pitch', pts.gyro_pitch,
                'gyro_yaw', pts.gyro_yaw
            )
            ORDER BY pts."timestamp"
        ) AS points
    FROM public.points pts
    JOIN matching_trips mt ON pts.trip_id = mt.trip_id
    GROUP BY pts.trip_id;
    $$;

    Example API call:

    • /imove-api/rpc/trips_at_ab?lon_a=144.984347&lat_a=-37.798946&lon_b=144.987254&lat_b=-37.799248&radius=5&start_time=2022-04-01T00:00:00&end_time=2022-04-01T23:59:59
billyc commented 4 months ago

Great, thanks for making those changes!

I've updated the code to use the new /trips_at_location and /trips_at_ab endpoints. This is working pretty well now.

The point-to-point is fairly fast but the single-point speed viz is retrieving data much more slowly. I'm not sure if there's a way to get the query running faster. My original code did a simple numerical lookup on the latitude and longitude, which was extremely fast. I think the PostGIS call is just taking a bit longer. It's not terrible, just a bit slower.

Some further notes:

1) The filters are not yet enabled because the Spark trip dataset had some additional post-processed columns which are not present in the JSON that I'm now receiving:

If those can be returned in the new API endpoings as additional columns, then I can reconnect the filter buttons. Could you do that @resuly ?

2) Do we want to pick some dates or date ranges instead of using the defaults?

3) When should I push this to the Unimelb web server? Do we now have an SSL certificate and should I set up the same API key access?

I will be traveling again tomorrow and next week I have some time off -- the sooner we can wrap this up, the better!

resuly commented 4 months ago

Thanks for the updates. Yes, we do have some performance issues with the database server. It could be due to hardware limitations or other reasons. I will try to optimise it by changing the settings such as cache size, index, etc. Please let me know if you have any ideas about optimising a large PostgreSQL dataset.

For your listed items:

  1. I have revised the functions, and they are returning is_weekday and veh_type now. However, I didn't get the meaning of start_time. It should be already known by sending the request, because we have start_time in URL(for example, start_time=2022-04-01T00:00:00)

  2. Yes, please put some default date range in the front end. It would be great if we can change the default values by yaml file. However, the wider range of the date may make the query slow and bring too much data to the front end. We may need to test it a little bit.

  3. I think we can push the codes as soon as you have any updates. As far as I know, SSL has not yet been deployed on this server. If API works well with HTTP, then we may not need to do that.

  4. In addition, can we change the iMove logo on each page to the university logo?

  5. When evaluating speed reliability for a single point, can we add an icon to show where the user clicked?

Thank you and good luck with your trips!

Updated functions (some unused columns are commented out to save space):

create function trips_at_location(lon double precision, lat double precision, radius double precision, start_time timestamp without time zone DEFAULT '2022-04-01 06:00:00'::timestamp without time zone, end_time timestamp without time zone DEFAULT '2022-04-01 12:00:00'::timestamp without time zone)
    returns TABLE(trip_id bigint, points jsonb)
    stable
    language sql
as
$$
    WITH matching_trips AS (
        SELECT DISTINCT pts.trip_id
        FROM public.points pts
        WHERE pts."timestamp" BETWEEN start_time AND end_time
            AND ST_DWithin(
                pts.point_snapped,
                ST_SetSRID(ST_MakePoint(lon, lat), 4326),
                radius / (111320 * cos(radians(lat)))
            )
    )
    SELECT
        pts.trip_id,
        jsonb_agg(
            jsonb_build_object(
                'point_id', pts.id,
                'point_snapped', ST_AsGeoJSON(pts.point_snapped)::jsonb,
                'point_timestamp', pts."timestamp",
                'speed', pts.speed,
                'is_weekday', EXTRACT(ISODOW FROM pts."timestamp") < 6,
                'veh_type', CASE WHEN t.veh_types = 'HCV' THEN 'HCV' ELSE 'car' END
                -- 'x_acc', pts.x_acc,
                -- 'y_acc', pts.y_acc,
                -- 'gyro_roll', pts.gyro_roll,
                -- 'gyro_pitch', pts.gyro_pitch,
                -- 'gyro_yaw', pts.gyro_yaw
            )
            ORDER BY pts."timestamp"
        ) AS points
    FROM public.points pts
    JOIN matching_trips mt ON pts.trip_id = mt.trip_id
    JOIN public.trips t ON pts.trip_id = t.id
    GROUP BY pts.trip_id;
$$;
create function trips_at_ab(lon_a double precision, lat_a double precision, lon_b double precision, lat_b double precision, radius double precision, start_time timestamp without time zone DEFAULT '2022-04-01 06:00:00'::timestamp without time zone, end_time timestamp without time zone DEFAULT '2022-04-02 12:00:00'::timestamp without time zone)
    returns TABLE(trip_id bigint, points jsonb)
    stable
    language sql
as
$$
    WITH points_at_a AS (
        SELECT DISTINCT trip_id
        FROM public.points
        WHERE "timestamp" BETWEEN start_time AND end_time
            AND ST_DWithin(
                point_snapped,
                ST_SetSRID(ST_MakePoint(lon_a, lat_a), 4326),
                radius / (111320 * cos(radians(lat_a)))
            )
    ),
    points_at_b AS (
        SELECT DISTINCT trip_id
        FROM public.points
        WHERE "timestamp" BETWEEN start_time AND end_time
            AND ST_DWithin(
                point_snapped,
                ST_SetSRID(ST_MakePoint(lon_b, lat_b), 4326),
                radius / (111320 * cos(radians(lat_b)))
            )
    ),
    matching_trips AS (
        SELECT trip_id
        FROM points_at_a
        INTERSECT
        SELECT trip_id
        FROM points_at_b
    )
    SELECT
        pts.trip_id,
        jsonb_agg(
            jsonb_build_object(
                'point_id', pts.id,
                'point_snapped', ST_AsGeoJSON(pts.point_snapped)::jsonb,
                'point_timestamp', pts."timestamp",
                'speed', pts.speed,
                'is_weekday', EXTRACT(ISODOW FROM pts."timestamp") < 6,
                'veh_type', CASE WHEN t.veh_types = 'HCV' THEN 'HCV' ELSE 'car' END
                -- 'x_acc', pts.x_acc,
                -- 'y_acc', pts.y_acc,
                -- 'gyro_roll', pts.gyro_roll,
                -- 'gyro_pitch', pts.gyro_pitch,
                -- 'gyro_yaw', pts.gyro_yaw
            )
            ORDER BY pts."timestamp"
        ) AS points
    FROM public.points pts
    JOIN matching_trips mt ON pts.trip_id = mt.trip_id
    JOIN public.trips t ON pts.trip_id = t.id
    GROUP BY pts.trip_id;
$$;
billyc commented 4 months ago

I'm running out of time to assist on this - but I think things are in pretty good shape.

billyc commented 4 months ago

For now I hard-coded the date range to one week April 1-7th. The filters don't really make any sense without a multi-day query. We can add a date picker next.

resuly commented 4 months ago

@billyc I agree. We can add more filter functions later. For now, we need a working version on the university server and close down the AWS services. Can you please make sure the other functions are working on the new server? I changed the config file (/data/simwrapper/simwrapper-config.yml) to match the server IP address, but the API still needs authorization. It would be great if we could configure the access token in this file (empty for no access token required). Thanks.

billyc commented 4 months ago

@resuly

I think this means the old AWS server can be shut down. We could continue forever on adding more filters etc, but at this point I do think it is at least as functional (if not more so) as before.

resuly commented 4 months ago

@billyc That's awesome! I just tested everything, and it works smoothly. On my end, the API seems to be working directly without any authorization. Can you please confirm this new feature? Everything else looks good to me. I think we can close down both AWS and fly.io services from now. Thank you so much for your help.

billyc commented 4 months ago

@resuly - Try using a private window or a different browser first. The site stores the API key, and since we didn't change the API key, it might still be using the one you previously entered...

If it still lets you access the data, let me know and I'll see if I can reproduce here.

resuly commented 4 months ago

@billyc Yes, you are right. It appears when I use incognito mode. Now everything looks great to me. Thanks!