Open smmaurer opened 5 years ago
A variant of this that's probably nicer is to generate the dictionary inside choicemodels
. We'd replace the so-called "raw results" of fitting the model with a tidy dictionary containing all of the estimated parameters, test statistics, and other metadata.
On the UrbanSim side, the template would save the entire dict into its yaml. The advantage here is that a user could then recreate the choicemodels
object at a later time in order to do additional analysis. (For example, generate optional output like relative risk ratios, or reprint the summary table without it being saved in the yaml file.)
The interface would look something like this:
m # choicemodels.MultinomialLogit
results = m.fit() # choicemodels.MultinomialLogitResults
print(results) # summary table
results.raw_results # NEW, this would provide a tidy dict
choicemodels.MultinomialLogitResults(model_expression, results) # rebuild from dict
(Historical note: The current "raw results" are already a dictionary, but with a confusing key structure that we don't want to retain. It's generated by the core estimation code, and we can't change it directly because other code relies on it. So i think we'd implement the new dict as a translation layer inside choicemodels.MultinomialLogitResults
: mnl.py#L228.)
Per potential questions:
fitted_parameters
separate although they are also included in the fit_statistics
object we will create.Regarding the choicemodels
variant, I think would be better. As you mentioned, the results.results
would be a tidier dict defined here https://github.com/UDST/choicemodels/blob/master/choicemodels/mnl.py#L262
This is a proposal to update the statistical templates to store additional model fit statistics as individually accessible values, rather than just embedded in the summary table. This would make downstream analysis easier.
How it works now
Using Large MNL as an example, these are the current class attributes related to model fit:
fitted_parameters
: list, saved to yamlsummary_table
: str, saved to yamlmodel
:choicemodels.MultinomialLogitResults
, not persisted beyond the sessionProposed changes
I'm thinking we should add a class attribute called
fit_statistics
and use it to store a flexible dictionary of values -- standard errors, t-stats, p-values, and model-level statistics like r^2 or log-likelihood.For large MNL, we can include everything we have access to from ChoiceModels: mnl.py#L344-L360. We'd build the dict at the end of the
fit()
method (large_multinomial_logit.py#L490), and include it in the constructor and dict i/o methods alongsidesummary_table
andfitted_parameters
.It looks like we can do this without any changes to
choicemodels
, because these statistics are part of the results object that's available immediately after fitting the model.Large MNL is the first priority here -- the other templates could probably wait until we refactor them for issue #54.
Potential questions
Should we store these as top-level attributes of the template class, rather than inside a dictionary? That would be more Pythonic, and would make it more obvious if info was missing or formats changed. But it would make the class definitions a lot longer, and it would vary between model types -- especially once we expand into multi-stage models and machine learning approaches. So i think better to just have a dictionary whose name is the same across models but whose content varies.
Should we keep
fitted_parameters
as its own attribute, or include it in the dict? I'd lean toward keeping it separate. It's the most important piece of info because it feeds directly into the simulations. And leaving it where it is means less downstream code to change.Should we stop saving summary tables and generate them on the fly instead? This would make the yaml files tidier. But i'm thinking we should keep them for now -- they come from multiple sources (ChoiceModels, Statsmodels, PyLogit) and we don't have a standard procedure for generating them yet. It would be nice to store them in a tidier way, though, per issue #48.
Tagging @alephcero @cvanoli for feedback