Open joecastiglione opened 1 year ago
One of the main problems in estimation mode is that the estimation data bundle files written out by ActivitySim are not efficient. For example, instead of constructing a destination choice sample where only sampled alternatives are in the choice set, instead an alternatives table is created with every destination listed, with missing values for unsampled alternatives. The other issue is that the choosers data written to the EDBs is limited to just the fields used in the existing utility equations. Instead, all potential household, person, and land-use data fields should be written to EDBs by default. That would provide the model estimator access to all potential data items in estimation. Also the data formats used are inefficient; CSV files are slow to read. Replacing these files with binary should greatly speed up the estimation process. It also seems pretty straightforward (?) to write a function that would write out the utility equations that the analyst specifies in Larch to a revised model spec file. These seem like relatively simple fixes that would address a lot of the problems we are experiencing.
Distilling the above conversations down into a "wish list" of improvements:
Estimation Mode Refinement
The current Estimation Mode features of ActivitySim are very much built in the spirit of similar functionality in DaySim: updated observed (survey) data can be fed into the process, and the combination of ActivitySim and Larch can work together somewhat automatically to generate updated parameter estimates. The tools that have been built allow Larch to construct a model that exactly mirrors the defined model in ActivitySim, re-estimate model parameters, and output ActivitySim coefficient files with new parameter estimates that can be used as a drop-in replacement for the existing coefficient files.
However, this tight integration breaks down when the user wants to update not only the coefficient values but also the functional form of utility equations. This leaves the user with two choices: (a) returning to ActivitySim for every tiny change to the specification files and re-running the entire estimation mode process, or (b) editing the utility equations in Larch while exploring different functional forms, and then needing to reconstruct matching specification files later in ActivitySim once the desired function form is selected. The former solution is tedious and slow, while the latter solution is error prone and requires fairly expert level understanding of the usage of both ActivitySim and Larch.
The goals of this task would be to more tightly integrate Larch and ActivitySim, to achieve (1) allowing users to move between these tools using a common utility specification format, (2) to speed up the generation of data to support revisions to utility functional forms, especially for large data bundles (i.e. destination and scheduling components, possibly by sampling), (3) to extend and enhance the documentation of the estimation process, and (4) improve error handling.
Improve Estimation Functionality
ActivitySim currently uses the Larch software to estimate models, which allows estimation results to be used directly by the simulation — dramatically reducing errors common in translating utility expressions into the ActivitySim specification. In version 1.5, numerous improvements are made to the estimation procedures including reducing the size of the estimation data bundles, increasing the speed at which they can be estimated, improving the reporting and error messaging capabilities of Larch, and improving the usability of the coefficient files created by the ActivitySim procedures. Further, an auditing will be done to confirm that the estimation procedures for each ActivitySim component are working as expected.
Complete Estimation Mode for Trip Models
Fully implement estimation mode for all submodels. Seem like we are going to get close in phase 5, but may not be totally complete.
Additional Description: