ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
189 stars 96 forks source link

[Feature] Input Checker #678

Open dhensle opened 1 year ago

dhensle commented 1 year ago

RSG is developing an input checker as part of the phase 8 work.

Input checker features The input checker is a series of checks to be run on ActivitySim inputs (synthetic population, landuse, skims) to try to catch problems in the input data that might otherwise crash ActivitySim downstream or lead to bad model results. The input checker will be run as the first "model" in an ActivitySim run and should run quickly before starting any subsequent ActivitySim sub-models.

Original Design The original design of the input checker aligns with the current paradigm of ActivitySim configuration files: there is a csv file that contains a list of python expressions that evaluate to True or False to pass the check. (For more details, see the presentations on Feb 16 and April 20).

A new proposal The ActivitySim consortium has been discussing the possibility of adding data model to the ActivitySim ecosystem. This data model (see #617) would leverage the pydantic and pandera packages to enumerate allowed values and provide documentation on what each data field represents. Instead of creating an input checker in line with the original design, a different approach would be to leverage the data model. The input checker code would then just validate the input data against what is available in the data model.

If moving towards the data model approach, RSG would implement the input checker and focus on the input side of the data model integration. This would include writing validator functions and checks in the data model as opposed to the csv "spec" file in the original approach. Both approaches would be fundamentally similar in function -- the data model would still be validating the input data via a series of checks as defined by the user.

Discussion of pros & cons between the original design and the new approach took place at the May 4 and May 11 meetings. Please view the meeting notes and slides on those meeting pages for more details and in-depth discussion.

Deadline for decision As decided in today's meeting, we are requesting further discussion and questions to be in before a decision is made at the May 18 meeting.

bettinardi commented 1 year ago

I agree with moving forward with the new proposal. Features that we might need to consider in this phase of the work and throughout ActivitySims further development:

Side note - I just finished "immune", which is amazing and should be required reading for the entire population. Writing about the input checker feels like building the immunity memory in the adaptive system so we will never feel sick from those errors again:

https://www.amazon.com/Immune-Kurzgesagt-gorgeously-illustrated-immune-ebook/dp/B08YR8FNCP