USFS-PNW / Fia-Biosum-Manager

User interface and main code repository for Biosum
http://biosum.info/
Other
3 stars 3 forks source link

OPTIMIZER: Update data types for tables in the optimizer_results.db #277

Closed lbross closed 2 years ago

lbross commented 2 years ago

The table schemas in optimizer_results.db use the standard SQLite datatypes: TEXT, INTEGER, and REAL. These data types cause issues when the tables are linked in MS Access using an ODBC driver. The FIA data mart uses MS Access data types even though it is an SQLite database. Olaf confirmed that the Tabling program can work with MS Access data types. The schemas need to be updated for these tables so that they use the same data types as the FIA data mart or BioSum. The easiest way to do this is to re-use the scripts that create these tables originally in MS Access, when applicable.

jsfried commented 2 years ago

test comment-- Lesley, can you add Sebastian to the distribution of when comments get entered for issues.

jsfried commented 2 years ago

Adding a 2nd test comment to see if Sebastian gets the emails now.

lbross commented 2 years ago

Sebastian hasn't accepted the invite yet, so he will probably not get an email.

jsfried commented 2 years ago

One more comment test?

jsfried commented 2 years ago

Several fields don't enter Access nicely when imported from SQLite 3 database via ODBC driver. This leads to their being unavailable for use in linking, indexing, etc. among tables, rendering analysis use in Access very problematic (can't even filter or sort on some of these the way they come into Access, I think). While it is true that one can gather up these fields and tables from various Access databases scattered across the project hierarchy, the point of the SQLite database is to provide one file that stores outputs and input assumptions and parameters as were in place at the time the analysis was run. Ideally, one would be able to link to this SQLite from Access to do analysis; however, linking has been problematic and it is necessary to actually import the tables into an Access database (which is workable). However the data types to which variables import depends on how they are specified in the SQLite 3 database. For import to work under current ODBC drivers, the types TEXT, MEMO, and BYTE need to be avoided whenever possible (there may be a field or two that can be longer than 255 characters where we may have to accept a MEMO type, that will transition as an ill behaved, from the standpoint of linking and searching and filtering etc. LONG TEXT in Access). The attached spreadsheet provides recommendations as to what all fields should show up in Access as (to be consistent with their Access representations elsewhere in the BioSum hierarchy and to ensure their usability by analysts) and a crosswalk page indicating what I think the types need to be in SQLite to accomplish this. In most cases, there are choices to be made that I am unaware of the implications of- for example, both CHAR (X) and VARCHAR (X) enter Access as SHORT TEXT (X) fields-- I don't know which of these is better or more compatible in terms of other purposes and needs, but TEXT is definitely not a happy conversion from SQLite to Access.

Altogether, there are 72 variables for which a type change is needed in the SQLite DB3. A few of these are changes to DATE-- not sure how practicable or critical those are-- would be nice if it came through as DATE in Access, but it would not surprise me if that turns out to be challenging and could live with DATES being represented as SHORT TEXT in Access (perhaps VARCHAR (50) in SQLite).

Please see attached EXCEL workbook for details. DataTypeChangeNeedsForDB3export_20220413.xlsx

lbross commented 2 years ago

I started work on the spreadsheet and didn't quite make it all the way through. But I will post what I have so you can review before tomorrow if desired. For background, the data types in v5.9.0 were copied from the MS Access types of the source tables, but it seems that some of these remain problematic. In column 5 I suggest the SQLite data type that we should change to for those columns that are problematic. Since I'm not an expert on database schemas, here is the logic I followed:

  1. For fields that exist in FIADB, I suggest the FIADB data type for consistency. Most of these appear to be in line with Jeremy's suggestions
  2. For fields that are in BioSum only, I looked for the same field in other BioSum tables that are working. For example, biosum_cond_id A couple of areas are in question. For the pop tables in the SQLite database, I copied the data types from FIADB. I want to make sure that the POP tables you are looking at have the most current table structure. Can you look at the tables through an SQLite client and make sure the text fields are VARCHAR rather than TEXT? It looks like most of the text fields in these tables are varchar and maybe that is a problem? Maybe check with Olaf about varchar vs char? We can try changing them to char to see if that helps. I didn't go beyond the pop_estn_unit table because all 4 tables will have similar issues. The processing_site table had several columns that are not in the official BioSum table schema. This is an example of what I was talking about last week where we are not sophisticated enough to tell the exact text type so we just used the generic text data type. And I'm not sure what to do about date. We are already setting these fields to date on the SQLite side. I don't know what more we can do. This may be a limitation of the ODBC driver? DataTypeChangeNeedsForDB3export_20220413.xlsx Maybe another Olaf question?
jsfried commented 2 years ago

Logic seems sound to me. I have surprise presentation in an hour, then meetings, so can't do anything today. I looked at POP tables in project/db/master.db -- makes sense to pattern off of how these are configured in the FIADB.DB distribution, probably. Looks like some more experimentation with the ODBC driver is needed to nail down some of these questions-- the empirical approach may be best. I wonder if there is more than one kind of date format in SQLite. Also does the ODBC driver work in both directions? If so, perhaps we should export Access data to DB and see what it becomes.

jsfried commented 2 years ago

I tried rerunning the optimizer scenario to get a fresh Access optimizer results database and then exporting that and I am still getting TEXT in the POP tables and NUM type for the date attribute. I am also getting a fail after exporting the SQLite file in that I can't then run optimizer to load the same or another scenario. The Optimizer Scenario selection window just sits there and won't respond to any button- a task manager closeout is required.

lbross commented 2 years ago

I found that the SQLite call we were using the create the POP tables in the context.db takes it upon itself to change the data types to SQLite native data types when it copies the table. This needs to be reworked to create the table with the desired data types and then to copy the data over. Fixing this may delay the release of the .msi. I'm going to hold off on looking at the post-export fail until I get the data types fixed.

jsfried commented 2 years ago

Understood (about the delay). Better to take it slow and get a fully useable export. I will try to connect with Olaf about the questions you posed that you thought he might be able to address.

lbross commented 2 years ago

We may not need Olaf's help. When we start producing the pop tables with varchar rather than text data types, that problem may go away. Date may still be outstanding. You can try linking to the pop tables in master.db with the date fields and see if they come across okay in Access. If they do, then we should be okay once the pop tables in context.db have our desired data types.

lbross commented 2 years ago

Here is the completed spreadsheet if you want to review. I did not go through the POP tables as I think those will self-correct once we update how they are created. DataTypeChangeNeedsForDB3export_20220413.xlsx

jsfried commented 2 years ago

Looks good to me. I understand that any extra columns in tables that are not BioSum native will come as they come and we'll live with that.

lbross commented 2 years ago

I fixed the pop tables and uploaded a sample optimizer_results.db3 to the Temp30Days folder. This is the context.accdb only, to keep it small and is for BlueDemo. If you're able to work with these tables through MS Access, then our questions about VARCHAR and DATE fields are resolved. Still working on all the other tables.

jsfried commented 2 years ago

These all look great! DATE came through with flying colors (though there is no data in the POP tables so I can't completely confirm until there is).

lbross commented 2 years ago

You are too efficient! There is data in the POP tables now. I noticed that the data was missing and tried to repost before you pulled them down, but I was too slow.

jsfried commented 2 years ago

Well now there is data in the tables, but the date columns are all null. I guess I'll assume that they'd be OK if they were not null. They are not null in the FIADB version of the POP tables but the copy I have of the Bluedemo project currently has no content in the pop tables in its .db

lbross commented 2 years ago

The data for the pop tables was derived from the master_pop_PNW.db because the BlueDemo POP tables were rebuilt using this source. I traced it through and the date columns are null in master_pop_PNW.db. Hence they are null in master.db and ultimately context.db. When we have more recent projects with recent pop tables, we should be able to take a closer look at the dates. I just posted a new version of context .db to Box containing both optimizer_results and context data. This version should have the requested/updated data types. Once @jsfried approves, I will generate a new .msi with this and the other fixes.

jsfried commented 2 years ago

I was already good with the data types, so let's consider this complete.

lbross commented 2 years ago

OK. I will generate a new .msi this afternoon.