cat-cfs / twobilliontoolkit

This repo stands as a singular place for all the tools that will be developed for the processing of 2 Billion Trees data and information
MIT License
1 stars 0 forks source link

Processing 2BT reported points #30

Open nesdolya opened 4 weeks ago

nesdolya commented 4 weeks ago

One of the first processing steps required for the 2BT spatial data processing workflow is 'spatializing' the latitude and longitude reported in the aspatial data. I have a hardcoded script that takes in an excel/csv and generates three spatial layers in a GDB: (1) lat/lon points in WGS84 (4326), (2) lat/lon points reprojected to Canada Albers Equal Area (102001), & (3) these points buffered to a circle that represents the reported planting area. Output 2 and 3 need to be added to the PostgreSQL DB.

Please utilize the existing script/tool and modify it (or recreate it) to ensure that these output are correctly stored in the PostgreSQL DB. And, modify the PostgreSQL schema to include mechanisms for version control of the site_points and site_buffered_points tables, specified below.

The most recent version of the existing tool is located here: \vic-fas1\projects_a\2BT\02_Tools\spatial_data_processing_workflow\create_buffered_points_2023DRAFT_v2.py Additional ArcPy class scripts used in the above tool are located: \vic-fas1\projects_a\2BT\02_Tools\spatialArcpyUtils

Checking for updates/changes to the existing data can either be done through Python or PostgreSQL trigger functions - whichever you are more familiar with.

Processing Steps:

  1. User provides aspatial 2BT data via Excel through command-line argument
  2. User provides .ini file for DB connection as command-line argument
  3. Tool reads required fields from Excel and ensures they are in the correct format/clean
  4. Any null geometry is dropped (empty lat/lon fields)
  5. Either use ArcPy (see scripts) or GeoPandas to generate the site_points and site_buffered_points and store in PostgreSQL DB (see below)

GeoPandas (GPD) Implementation (will likely be simpler b/c no intermediate datasets need to be stored locally):

  1. Create GPD dataframe using the lat/lon fields as the geometry in WGS84 projection (EPSG: 4326)
  2. [DEBUGGING ONLY] Store as shapefile locally
  3. Reproject GPD to Canada Albers Equal Area projection (ESRI: 102001)
  4. Store reprojected points to the site_points table in the PostgreSQL database, consider: 4.a. If a point already exists in site_points then compare the geometry. 4.a. If for a given SiteID the geometry is identical then do not commit 4.b. If for a given SiteID the geometry is not-identical then set dropped to True for that SiteID and commit new point to site_points table
  5. Using the reported site size derive the radius of a circle
  6. Buffer the GPD reprojected points by their given radius
  7. [DEBUGGING ONLY] Store as shapefile locally
  8. Store buffered points to the site_buffered_points table in the PostgreSQL DB, consider: 8.a. If a point already exists in the site_buffered_points table, then do not commit 8.b. If for a given SiteID the geometry is identical and the site size/buffer radius is identical = do not commit 8.c. If for a given SiteID the geometry is identical and the site size/buffer radius is not identical = set old record to dropped and commit new record 8.d. If for a given SiteID the geometry is not identical = set old record to dropped and commit new record

Requirements:

PostgreSQL Schema modifications required:

AnthonyRodway commented 1 week ago

Changes have been made to the branch: https://github.com/cat-cfs/twobilliontoolkit/tree/BufferBuilder-Processing-Reported-Points

All requirements have been completed and I have tried to catch some edge cases while updating/inserting into the database so hopefully didn't miss any!