bcgov / nr-fom

Forestry Operations Map
Apache License 2.0
0 stars 1 forks source link

Apply simplify algorithm to existing geometries and future submissions #632

Closed basilv closed 1 day ago

basilv commented 1 month ago

To address performance/capacity limitations for BCGW Extract, we want to apply a simplify algorithm to FOM spatial objects in order to drastically reduce the volume of data (# of vertices). This algorithm will be applied as a data fix to the existing data as well as applying when new spatial data is submitted.

Additional Context

Acceptance Criteria

Definition of Done

MCatherine1994 commented 2 weeks ago

Test data migration on geometry simplification locally with PROD data:

The runtime for the data migration is around 2 mins.

Image

Randomly picked one FOM, compared the local simplified version vs the prod version, can't tell big difference:

Image Image

Not sure why after applying the simplification algorithm the database size gets bigger? Before:

Image

After:

Image

basilv commented 2 weeks ago

@MCatherine1994 please remember to delete the prod data from your laptop as it contains private personal information (encrypted). For checking size, check the # of vertices used by each spatial object (the query for this is in the spike ticket) to confirm the migration actually ran as expected.

MCatherine1994 commented 2 weeks ago

Thanks Basil!! I will, it's just for temporary testing.

For vertices count, before: Image

After: Image

basilv commented 2 weeks ago

The first number looks correct - current prod I ran the query and got: 5,899,266 The second number (post-simplification) looks wrong as in my spike test I got ~100,000 vertices, not 1.7 million. Rerunning the following query in prod: select sum(ST_NPoints(ST_SimplifyPreserveTopology(geojson,2.5))) from app_fom.spatial_feature; I got the following result: 175,508

MCatherine1994 commented 2 weeks ago

Hi Basil @basilv, I tried again, so before run the simplification migration, I got the similar number 177,060 as you by run select sum(ST_NPoints(ST_SimplifyPreserveTopology(geojson,2.5))) from app_fom.spatial_feature;.

After run the migration, the number becomes 1739,483, which I think it's because our real simplification algorithm is running on the geometry not geojson UPDATE app_fom.cut_block SET geometry=ST_SimplifyPreserveTopology(geometry, 2.5) where geometry is not null;, and our tested script on topST_NPoints(ST_SimplifyPreserveTopology(geojson,2.5)) is running for geojson. So the number is different?

Image

basilv commented 2 weeks ago

Hmm, let me look into this a bit more.

basilv commented 2 weeks ago

@MCatherine1994 Ah, applying simplify directly to the geojson field is likely wrong. geomtry and geojson field are different coordinate systems, and the 2.5 argument is in meters and only applies to the BC Albers coordinate system, not lat/long.

basilv commented 2 weeks ago

I'm double checking though.

basilv commented 2 weeks ago

Yes, so the metrics in the spike ticket are wrong because they were run against the geojson field, and your metrics are likely correct although I'd have to review the data migration SQL to be 100% sure.

basilv commented 2 weeks ago

So to summarize: geometry field in the spatial tables stores in BC_Albers coordinate system geojson field in the view stores in lat/long (WGS84)

MCatherine1994 commented 2 weeks ago

Run the migration again

 -- update geometry column to apply the simplification algorithm in cut_block table
UPDATE app_fom.cut_block SET geometry=ST_SimplifyPreserveTopology(geometry, 2.5);

 -- update geometry column to apply the simplification algorithm in retention_area table
 UPDATE app_fom.retention_area SET geometry=ST_SimplifyPreserveTopology(geometry, 2.5);

  -- update geometry column to apply the simplification algorithm in road_section table
  UPDATE app_fom.road_section SET geometry=ST_SimplifyPreserveTopology(geometry, 2.5);

Image

ianliuwk1019 commented 2 weeks ago

I don't know the detail of database dump file and restore, but I saw this might be useful to try: postgresql: pg_restore, it has option for only specific "table/tables", not sure if worth to experimenting it on local: Image

OlgaLiber2 commented 2 weeks ago

@MCatherine1994 use the following wording for the disclaimer: The Forest Operations Map application simplifies detailed maps that users submit. To save space and speed up processing, the application reduces the number of points in these maps. It keeps the map's original shape accurate within about 2.5 meters. This process uses the Douglas-Peucker algorithm.

MCatherine1994 commented 1 week ago

Try to find the FOM that has biggest geospatial size Select * from app_fom.spatial_feature where ST_NPoints(geojson) = (select max(ST_NPoints(geojson)) from app_fom.spatial_feature);

Image

Test in FAM TEST, create two new FOMs:

with data of PROD FOM 1404: https://fom-test.nrs.gov.bc.ca/admin/a/100365 (which have 5 proposed cut blocks because the data is so big I couldn't include all, all proposed and final road sections, all proposed and final retention areas) with data of PROD FOM 1547: https://fom-test.nrs.gov.bc.ca/admin/a/100366 (all proposed, this FOM only has proposed submission in production)

After migration: Image

basilv commented 1 week ago

Reviewed Catherine's testing, confirmed the two new FOMs were via uploaded files. Comparison between FOM 1404 in prod and FOM 100365 in TEST showed the slight simplifications in TEST environment.

MCatherine1994 commented 1 week ago

Hi Basil, just want to clarify, in our FOM TEST https://fom-test.nrs.gov.bc.ca/admin/a/100365 cut blocks, I used FOM 1404 proposed cut block submission, and because it's so big, it should have 6 cut blocks in total, I only submitted 5 cut blocks.

MCatherine1994 commented 2 days ago

Checked the geojson points in PROD after deployment:

Image

MCatherine1994 commented 1 day ago

BCGW extract takes less than 1 mins now.

Image

OlgaLiber2 commented 1 day ago

@MCatherine1994 amazing! How long would it have taken before?

MCatherine1994 commented 1 day ago

@MCatherine1994 amazing! How long would it have taken before?

around 72s? between 70-80s I think