VERPAT output structure violates VisionEval architecture

jrawbits commented 3 years ago

The data-handling functions in VisionEval used for query and extraction expect the Datastore Group/Table/Name structure to have all the "Name" vectors (Datasets) the same length (corresponding to the Table's key field, such as HhId or VehId).

The VERPAT model, however, does not respect that requirement and puts a number of differently-sized vectors into the same tables. That leads to two failures when we try to read through the model results in "tabular" or "data.frame" structure.

The Vehicles table in the future year contains two series with different lengths (one keyed on HhId, and VehId and one keyed on HhIdFuture and VehIdFuture). Consequently, we cannot query or extract that table in a single operation - instead the "Table" needs to be split in two depending on the length of the datasets (the "extract" tool function in VE 3,9 has been adjusted to handle that)
The Global Model table generally has single values (derived from model_parameters.json), except that CostsPolicy and CostsIdPolicy are 5-element vectors (not single values) that are generated in the CalculatePolicyVmt module. When the Model table is extracted, the single values are recycled 5 times. That's harmless, but misleading.

The correct solution is to ensure that within VERPAT and its model-specific modules, each Dataset is written into a Table with a unique number of rows corresponding to the Table's key field(s).

Two fixes are required in the module code:

The Vehicle...Future datasets should be in their own Table, and
The CostPolicy vector and its key should be in their own table (not in "Model"). There is precedent for creating new tables in Global.

m-mcqueen commented 3 years ago

Looking forward to seeing this issue resolved, as I'm planning to use VERPAT soon. Just did a run-through with the default data and encountered this error. What is a work around for the time being? Thanks!

dflynn-volpe commented 3 years ago

There are some workarounds. First of all, you can use the extract() functionality for Azones (counties) Bzones (place types), Households, and the Marea (metropolitan area). Then you can manually query the Global and Vehicles groups as needed.

Let's run the default VERPAT model:

rpat <- openModel('VERPAT')
rpat$run() 

# Select Azone, Bzone, Household, and Marea geographies.
rpat$tablesSelected <- c('Azone', 'Bzone', 'Household', 'Marea')

# Extract all outputs to csv files
rpat$extract()

To get the Vehicles outputs, you can take advantage of readDatastoretables to produce outputs that you need. The extract() functionality above is very convenient by outputting all possible outputs to csv (or as internal R objects), but readDatastoretables may actually be more useful to you by extracting just the outputs of interest. However, you do have to define what outputs you want.

First, you can query the entire datastore to find out all available variables for outputting. Assuming you have just run the demo VERPAT model, do the following (thanks to @gregorbj for building this functionality and sharing example code!):

setwd('models/VERPAT')

QPrep_ls <- prepareForDatastoreQuery(
  DstoreLocs_ = "Datastore", 
  DstoreType = "RD")

Then, make an inventory of the datastore

This creates a zip archive which documents all the datasets in the datastore. The archive is organized by group. Within each group folder is a set of CSV files, one for each table in the group. Each CSV file lists the datasets included in the table giving the dataset name, data type, units, and description.

documentDatastoreTables(
  SaveArchiveName = "DatastoreDocumentation", 
  QueryPrep_ls = QPrep_ls)

Now look at the outputs in the documentation:

The Vehicle.csv file will tell you what variables are available. We can choose variables from this to extract.

First, write a list named TablesRequest_ls:

The named components are tables; each component is a named vector where the names are the names of datasets and the values are the units that the data is to be retrieved in "" means retrieve the data in the units used in the datastore.

TablesRequest_ls <- list(
  Vehicle = c(
    Azone = "",
    HhId = "",
    Mileage = "",
    Dvmt = "",
    Powertrain = ""))

Then call the readDatastoreTables function using the list of requested tables and datasets:

TableResults_ls <-
  readDatastoreTables(
    Tables_ls = TablesRequest_ls,
    Group = "2035",
    QueryPrep_ls = QPrep_ls
  )

The readDatastoreTables function returns a list having two named components: "Data" and "Missing" The "Data" component is a named list where each named component corresponds to a requested table and the value is a data frame containing the requested datasets in the table.

lapply(TableResults_ls$Data, function(x) head(x))

The first six out of 599,198 rows:

jrawbits commented 2 years ago

Note that the VE 3.0 version of extract for model results will successfully create two Vehicle tables for VERPAT (based on the different number of rows in the alternate futures). So currently, everything "works" for extraction. I expect the integrated query system will fail for anything that might get pulled out of both "halves" of the vehicle table.

The deeper fix of restructuring VERPAT's vehicle outputs is very much still on the table.

VisionEval / VisionEval-Dev

VERPAT output structure violates VisionEval architecture #142