Closed basilv closed 1 day ago
Test data migration on geometry simplification locally with PROD data:
pg_dump -U postgres -h localhost -p 4200 -d fom -f prod-backup-file.sql
oc logout
psql -U postgres
DROP DATABASE fom;
CREATE DATABASE fom;
psql -U postgres -h localhost -p 5432 -d fom < prod-backup-file.sql
The runtime for the data migration is around 2 mins.
Randomly picked one FOM, compared the local simplified version vs the prod version, can't tell big difference:
Not sure why after applying the simplification algorithm the database size gets bigger? Before:
After:
@MCatherine1994 please remember to delete the prod data from your laptop as it contains private personal information (encrypted). For checking size, check the # of vertices used by each spatial object (the query for this is in the spike ticket) to confirm the migration actually ran as expected.
Thanks Basil!! I will, it's just for temporary testing.
For vertices count, before:
After:
The first number looks correct - current prod I ran the query and got: 5,899,266 The second number (post-simplification) looks wrong as in my spike test I got ~100,000 vertices, not 1.7 million. Rerunning the following query in prod: select sum(ST_NPoints(ST_SimplifyPreserveTopology(geojson,2.5))) from app_fom.spatial_feature; I got the following result: 175,508
Hi Basil @basilv, I tried again, so before run the simplification migration, I got the similar number 177,060 as you by run select sum(ST_NPoints(ST_SimplifyPreserveTopology(geojson,2.5))) from app_fom.spatial_feature;
.
After run the migration, the number becomes 1739,483, which I think it's because our real simplification algorithm is running on the geometry not geojson UPDATE app_fom.cut_block SET geometry=ST_SimplifyPreserveTopology(geometry, 2.5) where geometry is not null;
, and our tested script on topST_NPoints(ST_SimplifyPreserveTopology(geojson,2.5))
is running for geojson. So the number is different?
Hmm, let me look into this a bit more.
@MCatherine1994 Ah, applying simplify directly to the geojson field is likely wrong. geomtry and geojson field are different coordinate systems, and the 2.5 argument is in meters and only applies to the BC Albers coordinate system, not lat/long.
I'm double checking though.
Yes, so the metrics in the spike ticket are wrong because they were run against the geojson field, and your metrics are likely correct although I'd have to review the data migration SQL to be 100% sure.
So to summarize: geometry field in the spatial tables stores in BC_Albers coordinate system geojson field in the view stores in lat/long (WGS84)
Run the migration again
-- update geometry column to apply the simplification algorithm in cut_block table
UPDATE app_fom.cut_block SET geometry=ST_SimplifyPreserveTopology(geometry, 2.5);
-- update geometry column to apply the simplification algorithm in retention_area table
UPDATE app_fom.retention_area SET geometry=ST_SimplifyPreserveTopology(geometry, 2.5);
-- update geometry column to apply the simplification algorithm in road_section table
UPDATE app_fom.road_section SET geometry=ST_SimplifyPreserveTopology(geometry, 2.5);
I don't know the detail of database dump file and restore, but I saw this might be useful to try:
postgresql: pg_restore, it has option for only specific "table/tables", not sure if worth to experimenting it on local:
@MCatherine1994 use the following wording for the disclaimer: The Forest Operations Map application simplifies detailed maps that users submit. To save space and speed up processing, the application reduces the number of points in these maps. It keeps the map's original shape accurate within about 2.5 meters. This process uses the Douglas-Peucker algorithm.
Try to find the FOM that has biggest geospatial size Select * from app_fom.spatial_feature where ST_NPoints(geojson) = (select max(ST_NPoints(geojson)) from app_fom.spatial_feature);
Test in FAM TEST, create two new FOMs:
with data of PROD FOM 1404: https://fom-test.nrs.gov.bc.ca/admin/a/100365 (which have 5 proposed cut blocks because the data is so big I couldn't include all, all proposed and final road sections, all proposed and final retention areas) with data of PROD FOM 1547: https://fom-test.nrs.gov.bc.ca/admin/a/100366 (all proposed, this FOM only has proposed submission in production)
After migration:
Reviewed Catherine's testing, confirmed the two new FOMs were via uploaded files. Comparison between FOM 1404 in prod and FOM 100365 in TEST showed the slight simplifications in TEST environment.
Hi Basil, just want to clarify, in our FOM TEST https://fom-test.nrs.gov.bc.ca/admin/a/100365 cut blocks, I used FOM 1404 proposed cut block submission, and because it's so big, it should have 6 cut blocks in total, I only submitted 5 cut blocks.
Checked the geojson points in PROD after deployment:
BCGW extract takes less than 1 mins now.
@MCatherine1994 amazing! How long would it have taken before?
@MCatherine1994 amazing! How long would it have taken before?
around 72s? between 70-80s I think
To address performance/capacity limitations for BCGW Extract, we want to apply a simplify algorithm to FOM spatial objects in order to drastically reduce the volume of data (# of vertices). This algorithm will be applied as a data fix to the existing data as well as applying when new spatial data is submitted.
Additional Context
Acceptance Criteria
Definition of Done