cityofaustin / atd-data-tech

Austin Transportation Data & Technology Services
17 stars 2 forks source link

Update Moped AGOL ETL to explode out Multipoint component geometries into Points in new layer #17999

Open mddilley opened 2 months ago

mddilley commented 2 months ago

When capturing project component features that are points in the Map tab, we convert Point geometries into Multipoint before storing them in the database from within the front end code. Each of these Multipoint features contains a single point, but then we combine all Multipoint features associated with each project component into a single Multipoint geometry in the component_arcgis_online_view. This view is then used to fetch component data that is pushed to the Moped Feature service (https://services.arcgis.com/0L95CJ0VTaxqcmED/ArcGIS/rest/services/Moped_Project_Components/FeatureServer/) that powers the Moped Database Viewer app. Note that this feature service does require auth - I can help with the endpoint to get a token and then fetching data from this endpoint.

ATSD is running into issues with implementing labeling since it isn't supported for Multipoint type geometries in ArcGIS Online. Here is video that explains the feature that our data format is blocking.

To do this, we need to:

Examples of components with Multipoint geometry with more than one point:

SELECT
    *
FROM
    component_arcgis_online_view
WHERE
    project_component_id IN(469)
mddilley commented 2 months ago

@frankhereford @johnclary I caught the rest of the dev group up on this one last Wednesday at refinement. Whenever y'all have a chance, could you take a look at this one? I remember some context about why we went with Multipoint over Point, but I know y'all have a lot more knowledge about this than me.

Thanks!

johnclary commented 2 months ago

yes, had to refresh my memory on this one. to recap the basic structure of a component,moped_project_components can have many features. each feature has one geography column. every geography is stored as a geometry type (point, multipoint, line, etc).

the actual geometry type we store in the feature geography column is controlled by javascript. a postgis geography column can handle whatever type you provide it. ie you can insert a polygon into the feature_drawn_points.geograpy column.

somewhere along the way we decided to store all point features as multipoint instead of point. as far as i can best recall, the only reason we made this choice was to have future flexibility with respect to the Moped UI. (it's also possible that the mapbox draw plugin made this decision for us). although it's theoretically possible that these geographies contain multiple points, i'm 99% sure none of them do.

however, this is a separate concern from the geographies at we build up in the agol component view. we are forced to use multipoint geographies there, because each component may comprise multiple points, for example this project component which has a bunch of intersection features.

if we wanted to eliminate the presence multipoint component geographies, we'd need to prevent users from, .e.g, picking multiple points when creating an Intersection Improvement component. this is an interesting idea, because i think this may be an improvement from a data quality perspective.

mddilley commented 2 months ago

thanks for all this info @johnclary 🙏 That helped me out a lot, and I also had the same recollection about capturing Multi type features in the editor for future proofing. But I'm now seeing the difference between what we capture in the app and the true Multipoint geometries in component_arcgis_online_view like this one:

{"type":"MultiPoint","coordinates":[[-97.734674,30.266826],[-97.734306,30.267731],[-97.733866,30.266577],[-97.73351,30.267508]]}

I was focusing on the React code like you mentioned where we capture multiple features per component, and I missed that we combine those component features in the DB view before pushing to the AGOL dataset.

mddilley commented 1 month ago

Dropping in the queries that I've been using to see how many components have more than one point:

-- intersections with more than one selected point and more than one drawn point
WITH intersections AS (
    SELECT
        mpc.project_id,
        mpc.project_component_id,
        COUNT(fdl.component_id) AS count_drawn_points,
        COUNT(fi.component_id) AS count_intersection_points
    FROM
        moped_proj_components mpc
    LEFT JOIN feature_drawn_points fdl ON fdl.component_id = mpc.project_component_id
    LEFT JOIN feature_intersections fi ON fi.component_id = mpc.project_component_id
    WHERE fdl.is_deleted = FALSE AND fi.is_deleted = FALSE
GROUP BY mpc.project_component_id
)
SELECT
    project_id,
    project_component_id,
    count_drawn_points,
    count_intersection_points
FROM intersections
WHERE
    count_intersection_points > 1
    AND count_drawn_points > 1
-- signals with more than one selected point and more than one drawn point
WITH signals AS (
    SELECT
        mpc.project_id,
        mpc.project_component_id,
        COUNT(fdl.component_id) AS count_drawn_points,
        COUNT(fs.component_id) AS count_signal_points
    FROM
        moped_proj_components mpc
    LEFT JOIN feature_drawn_points fdl ON fdl.component_id = mpc.project_component_id
    LEFT JOIN feature_signals fs ON fs.component_id = mpc.project_component_id
    WHERE fdl.is_deleted = FALSE AND fs.is_deleted = FALSE
GROUP BY
    mpc.project_component_id
)
SELECT
    project_id,
    project_component_id,
    count_drawn_points,
    count_signal_points
FROM signals
WHERE
    count_drawn_points > 1
    AND count_signal_points > 1
mddilley commented 1 month ago

Updates from refinement on 7/17:

Edit 7/24/24 Notes from previous discovery

Options:

  1. Add another layer to the AGOL dataset that explodes out components with more than one point feature into multiple components per feature
    • Right now there are 3 layers:
      • Points (the ETL merges all points features associated with one component into one Multipoint)
      • Lines
      • Combined points and lines
    • Seems like lower lift but would require downstream updates by users
    • Requires work by Geo team and updates by users of the layer
  2. Change app to only allow one point per component
    • We already have a lower limit (must have at least one feature per component) so adding an upper limit based on geometry type seems trivial
    • Nathan W mentioned that having more than one point per component is not wanted by ATSD
    • Nathan also offered to help clean up project components if it is a small amount
    • Would require us to split out components with multiple points into separate components
    • Better data quality
    • Work to enforce one point per component first
    • Then, do data cleanup tasks
mddilley commented 1 month ago

@Charlie-Henry shared this code https://github.com/cityofaustin/dts-work-zone-data-feed/blob/0a73357a6ef69c0047aa879b21ec426f38df6599/data%20sources/amanda_turp.py#L40-L45

mddilley commented 1 month ago

@frankhereford The new points layer has been added to this feature service by Andrew. You can see the details here, and the new layer ID is 3 which is what the ETL code uses to point to specific layers.

Let me know when you pick back up on this one, and we can sync up. I know we talked about maybe talking through the ETL, etc. last week. Thanks!

frankhereford commented 1 month ago

Questions for Mike -- this list is bound to change, so please don't sweat trying to answer them until I formalize it and present it to you. I might even work out the answers in advance, but I didn't want to let the ideas slip. Thanks!

1) When the records which have a MultiPoint geometry that are going to explode into multiple records because they store > 1 points, we are going to loose having a set of non-geometry columns that can be used as an ID for that record. Would you be interested in having another column available that represents the cardinality of the point out of the multipoint?

frankhereford commented 1 month ago
SELECT 
    ST_GeometryType(dump.geom) AS geometry_type,
    dump.geom AS mixed_geometry_native_geometry,
    dump.path[1] AS point_index,
    component_arcgis_online_view.*
FROM 
    component_arcgis_online_view,
    LATERAL ST_Dump(ST_GeomFromGeoJSON(component_arcgis_online_view.geometry)) AS dump
WHERE 
    ST_GeometryType(ST_GeomFromGeoJSON(component_arcgis_online_view.geometry)) = 'ST_MultiPoint'
UNION ALL
SELECT 
    ST_GeometryType(ST_GeomFromGeoJSON(geometry)) AS geometry_type,
    ST_GeomFromGeoJSON(geometry) AS mixed_geometry_native_geometry,
    NULL AS point_index,
    *
FROM 
    component_arcgis_online_view
WHERE 
    ST_GeometryType(ST_GeomFromGeoJSON(geometry)) = 'ST_MultiLineString';

☝️ A query which explodes all the records from the view that bear MultiPoint geometries into Points, but leaves the MultiLineStrings alone as Multis. Not technically needed as part of this work, but a potentially helpful artifact.