USFS-PNW / Fia-Biosum-Manager

User interface and main code repository for Biosum
http://biosum.info/
Other
3 stars 3 forks source link

GIS: Provide mechanism to update plot.gis_yarding_distance from plot_gis.MoveDist_ft_REPLACEMENT #305

Closed lbross closed 3 weeks ago

lbross commented 11 months ago

plot.gis_yarding_distance is set to fiadb_fvs_variant.move_distance_ft linking on the 6-digit plot and fvs_variant fields. The fiadb_fvs_variant table lives in the biosum_ref.accdb and is shared by all projects. We have done our best to calculate this field correctly, but it's possible that the values may change for projects for a variety of reasons. Or an analyst may wish to use their own custom road network.

One possible approach: Implement an interface to update the values in the fiadb_fvs_variant.move_distance_ft from an analyst-supplied database and table. This table would have to have the plot, fvs_variant, and move_distance_ft values. All plots added after this change would get the new gis_yarding_distances. We would also need a function to update the gis_yarding_distance for existing plots after the new data is loaded.

A potential downside is that this table would be overwritten if we release subsequent updates to biosum_ref.accdb.

Another possibility is to change the location of the FIADB FVS Variant in the project data sources to point to a project-specific version of this table. This would have to be tested, but I think it would work with no coding changes required. It wouldn't update plots that have already been loaded.

lbross commented 11 months ago

Notes from today's DEV call

  1. @sebbusby successfully updated the plot.gis_yarding_distance from the plot_gis.NEARDIST_FT in the gis_travel_times_master.accdb that is generated using Demetrios R script. He joined on the STATECD and PLOT. This is a good method to use for updating plots that are currently already in BioSum.
  2. @sebbusby suggested that we add a step to update plot.gis_yarding_distance from plot_gis.NEARDIST_FT when an analyst uses the Optimizer 'Load GIS Data' button to update the processing_site and travel_time tables in the \gis\db\gis_travel_times.accdb from the AppData\gis_travel_times_master.accdb.
  3. We attempted to change the location of the 'FIADB FVS Variant' data source in the project data sources to point to a project-specific version of this table but this is currently prohibited by the UI due to the special syntax required to locate the table in the AppData directory. If we want to move forward with this part of the solution, this prohibition needs to be removed.
lbross commented 11 months ago

We need to convert the gis_travel_times_master.accdb to an SQLite .db. The preferred course of action is develop a workflow to load the values from a .csv file.

lbross commented 10 months ago

We will add a step to update plot.gis_yarding_distance from plot_gis.NEARDIST_FT when an analyst uses the Optimizer 'Load GIS Data' button to update the processing_site and travel_time tables in the \gis\db\gis_travel_times.accdb from the AppData\gis_travel_times_master.db. BioSum will ask the user if they want to update the plot table before the process starts.

When the process completes, we will run an audit to see if any plots were not updated from the gis_travel_times_master.db. If any were missed, we'll print the plot numbers to a text file and pop an error message.

We don't plan to modify the project data sources screen at this time.

lbross commented 10 months ago

@sebbusby will work with Demetrios to generate a gis_travel_times_master.db in SQLite format. Lesley will provide the SQLite create table scripts to @sebbusby. The changes to update the 'Load GIS' algorithm are dependent on having an SQLite gis_travel_times_master.db.

sebbusby commented 10 months ago

@sebbusby will work with Demetrios to generate a gis_travel_times_master.db in SQLite format. Lesley will provide the SQLite create table scripts to @sebbusby. The changes to update the 'Load GIS' algorithm are dependent on having an SQLite gis_travel_times_master.db.

@lbross I used some simple R code to load the gis_travel_times_master.accdb tables into R and write the same tables to an empty gis_travel_times_master.db. The results are on Box/BioSumBox/Temp30Days/GIS_Travel_Times_Migrate_Sqlite.zip. You should be able to change the field datatypes to whatever you need them to be (or are most efficient). The current field datatypes are just what R formats them to by default when writing the tables in SQLite.

lbross commented 9 months ago

Thanks @sebbusby. The sample database you provided uses 'pure' SQLite data types: TEXT, INTEGER, and REAL. This is fine for BioSum to consume. We write BioSum tables using MS Access-friendly data types and BioSum can move the data into them with no issues. However ... if an analyst needs to work with the gis_travel_times_master.db using MS Access with the ODBC driver, then we could have a problem.

If I provide a create table script for each table (see example below) could this be used by an R script to create/load the tables with Access data types ?

> CREATE TABLE plot_gis (
>     STATECD     INTEGER,
>     PLOT        INTEGER,
>     COUNTY      INTEGER,
>     INVYR_LATE  INTEGER,
>     ORIG_LAT   DOUBLE,
>     ORIG_LON    DOUBLE,
>     NEARDIST_M  DOUBLE,
>     NEARDIST_FT DOUBLE
> );
jsfried commented 9 months ago

@lbross, @sebbusby - Thanks, and good catch, we do want to be careful about the data types we use in SQite to maintain Access compatibility via the SQLite driver. floating point and text variables seem to be the most problematic. e.g., decimal and real can cause problems-- double is always safest; I don't know which SQLite text type is a problem but there is some kind of text type that turns into a MEMO (also toxic to Access).

lbross commented 9 months ago

@sebbusby: I'm working on migrating the load from MS Access to SQLite. I just noticed that city is null for all of the records in the processing_site table in the gis_travel_times_master.db. This is fine with BioSum but I wanted to let you know about it in case it shouldn't be null.

lbross commented 9 months ago

Sample 'CREATE TABLE' statement for @sebbusby. Can you use this in R to set the data types before populating the table?

CREATE TABLE plot_gis ( STATECD INTEGER, PLOT INTEGER, COUNTY INTEGER, INVYR_LATE INTEGER, ORIG_LAT DOUBLE, ORIG_LON DOUBLE, NEARDIST_M DOUBLE, NEARDIST_FT DOUBLE);

sebbusby commented 9 months ago

@lbross I realized that when writing the empty GIS_Travel_Times SQLite db, it is easy to explicitly specify the data type for each column. e.g.,

dbWriteTable(con, "plot_gis", plot_gis, overwrite = TRUE, field.types = c(STATECD = "integer", PLOT = "integer", COUNTY = "integer", INVYR_LATE = "integer", ORIG_LAT = "double", ORIG_LON = "double", NEARDIST_M = "double", NEARDIST_FT = "double"))

So, if you can list what the desired data types are for the other tables (processing_site, travel_time), I can just directly program them into the R script that creates the empty GIS_Travel_Times.db

lbross commented 9 months ago

Great! Please see desired data types below for those two tables. I used the same data types that we have in BioSum for fields that are in both sets of tables and suggested data types for the others based on the data in the sample tables.

CREATE TABLE processing_site ( PSITE_ID INTEGER, NAME CHAR(100), TRANCD INTEGER, TRANCD_DEF CHAR(20), BIOCD INTEGER, BIOCD_DEF CHAR(15), EXISTS_YN CHAR(1), LAT DOUBLE, LON DOUBLE, STATE CHAR(2), STATECD INTEGER, CITY CHAR(40), COUNTY CHAR(40), COUNTYCD INTEGER, STATUS CHAR(40), MILL_TYPE CHAR(40), VALID_PSITE_FCOM CHAR(15), REQUIREMENTS_OR_CAPACITY CHAR(100), PRODUCTION_UNITS CHAR(100), DDLat DOUBLE, DDLon DOUBLE, ORIG_OID INTEGER );

CREATE TABLE travel_time ( TRAVELTIME_ID INTEGER, STATECD INTEGER, PLOT INTEGER, PSITE_ID INTEGER, TRAVEL_MODE CHAR(1), ONE_WAY_HOURS DOUBLE, RAILHEAD_ID INTEGER, COLLECTOR_ID INTEGER );

sebbusby commented 9 months ago

@lbross Thank you! I have coded the listed data types into the R script and updated a gis_travel_times_master.db located at Box/BioSumBox/Temp30Days/GIS_Travel_Times_Migrate_Sqlite.zip

lbross commented 8 months ago

Thanks @sebbusby. I have downloaded the updated .db and will continue my work with it. I wonder if we want to add primary keys or indexes to these tables for performance/data integrity? The old .accdb version did not have primary keys. On the BioSum tables travel_time.traveltime_id and processing_site.psite_id are primary keys, which also function as indexes for performance. plot_gis doesn't exist in the BioSum tables.

@jsfried: You may want to look at these tables through MS Access/ODBC to see if you can work with them as needed. BioSum doesn't care much about the datatypes when loading.

lbross commented 8 months ago

This question is about handling plots that don't have an entry in the plot_gis table when updating gis_yarding_distance. Do we need to retain the original value for plots that are missing from the plot_gis table? The easiest way to accomplish this is to set the gis_yarding_distance to null before updating it from the plot_gis table. It's a simple query to find the plots that are null following the update so they can be written to an audit file. If we need to keep the original value, it will be more complicated because we need to do some additional queries and processing to find rows where there isn't a match.

sebbusby commented 8 months ago

@lbross Great question, my sense is that we would want to keep the original gis_yarding distance value associated with a plot if there is not a replacement value (plot record) in the plot_gis table. Otherwise we run the risk of plots dropping out from Processor if they have null yarding distance - better to use a potentially outdated yarding distance value than no value at all?

lbross commented 8 months ago

OK. I've got a query that records the biosum_plot_id, plot, gis_yard_dist_ft, and NEARDIST_FT if the biosum plot is missing from the master plot_gis table. Are there any other fields you'd like in the audit table? statecd? The gis_yard_dist_ft will be the value BEFORE the plot table is updated.

sebbusby commented 8 months ago

@lbross Awesome! statecd would be indeed nice to add, don't think we need any other fields.

lbross commented 8 months ago

We will not add primary keys at this time. If we encounter performance problems in the future, we may revisit this decision. The audit results will live in the ‘plot_distance_audit’ table in a new gis\db\gis_audit.db.

lbross commented 8 months ago

There were some questions about the plot.gis_yard_dist_ft and how Processor handles nulls and -1. I looked at the fiadb_fvs_variant table in the biosum_ref.accdb. There are records with both nulls and -1 for MoveDistance_ft. issue #212 indicated that @jsfried populated the fiadb_fvs_variant table. I'm not sure what the difference is between the nulls and -1. When the plots are loaded, gis_yard_dist_ft is set to MoveDistance_ft regardless of the value. I verified that Processor handles nulls and -1 the same. It drops them from the analysis, writes them to a log file, and pops a messageBox warning of the problem.

lbross commented 1 month ago

Items to validate:

jsfried commented 3 weeks ago

Audit table recreated with correct field names. BioSum only offered replacement option when REPLACEMENT attribute was in the DB BioSum faithfully updates the records-- even if there is an existing value and the replacement value is NULL-- a bit surprising but probably the right choice. We need to write docs that emphasize that the REPLACEMENT field needs to be fully populated with correct values-- nulls will NOT prevent an overwrite. So this is for an advanced analyst who is comfortable managing multiple versions of the GTTM database (since they will need to DELETE records that they don't want to have updated in master.plot The original value is retained if there is no matching records in the GTTM database plot_GIS table.