Urban-Analytics-Technology-Platform / acbm

activity-based modelling pipeline (for transport demand models)
https://github.com/Urban-Analytics-Technology-Platform/acbm/wiki
Apache License 2.0
8 stars 1 forks source link

Validation framework for model #17

Open sgreenbury opened 7 months ago

sgreenbury commented 7 months ago

Aim: define metrics to be used at different parts of the modelling to validate model against data. E.g. flows from QUANT model. See section in wiki.

sgreenbury commented 7 months ago

Some measures to consider:

Hussein-Mahfouz commented 7 months ago

Guidelines from dft on activity and agent based models: TAG unit M5-4

Hussein-Mahfouz commented 7 months ago

Putting this here for now but could be a separate issue on calibration later:

"We need data sets to be recognized as components on the same level as models. Such data components can then enter the integrated frameworks at various places, not only at the top, as input to drive the whole integrated model, and at the bottom, to compare with the output and to calibrate the model. Data components can be also used between components to test, adjust, and correct the data flows inside the integrated model. This can substantially increase the efficiency and accuracy of the integration process, and reduce the overall complexity of the calibration task for the whole integrated model." - ‘Integronsters’, integral and integrated modeling

A useful exercise could be to identify the datasets that could be used to calibrate at intermediate points of the pipeline

sgreenbury commented 6 months ago

Notes on tasks:

sgreenbury commented 6 months ago

We also need to determine a set of metrics for measuring quality of matching between the two datasets as part of task1.

BZ-BowenZhang commented 6 months ago

We also need to determine a set of metrics for measuring quality of matching between the two datasets as part of task1.

I completely agree with @sgreenbury, and I think calculating the metrics for comparison is easy. But after generating metrics, I have a question: How do we judge the 'goodness'? In other words, how do we set a threshold value as the acceptable standard? We do not have another candidate dataset to compare, but maybe we can compare it with other synthetic populations published in previous papers.

sgreenbury commented 6 months ago

Adding reference with validation methods from @stuartlynn

BZ-BowenZhang commented 3 months ago
BZ-BowenZhang commented 2 months ago

As discussed on 20th Sep: