Closed smmaurer closed 5 years ago
@smmaurer: 1.) Great, thanks! The idea you propose of getting probs without having to use choicemodels is useful.
3.) The change here was made because the predicted price column doesn't exist prior to simulation (estimated off of a separate dataset).
5.) This was for estimating a development project location choice model, with buildings as choosers and parcels as alternatives (MRCOG wanted a lcm-based backup to the proforma model). The issue I think was that explanatory variables with the same name existed on both the buildings table and parcels table, so there was a merged-table-column-name-conflict. I could suggest that they name the variables differently. Any other ideas come to mind?
Sounds good about #2 and #4 being already taken care of in master! I'll create an issue in consulting-planning to test the various project simulations using the latest templates code.
Thanks @janowicz! This all makes sense and i've edited the task list above accordingly. Will start a new branch with these updates.
I'm not sure about part 5. It's possible this will have been fixed with 0.1.dev21 as well. But from the ej-test
code, it looks like the duplicated column name is used in the model expression, which is more of a challenge..
What i've been thinking about as a solution for cases like this is to create a derived variable with an unambiguous name (for now in variables.py
, soon with a template call).
Example: suppose you have columns households.tract_id
and jobs.tract_id
and you don't want to change the data schema, but you need to estimate a job choice model where both ids appear in the choice table (e.g. in order to calculate distance as an interaction term). You could create derived variables households.household_tract_id
and jobs.employment_tract_id
that are just copies of the other columns, but can be used unambiguously in the model expression.
This seems better to me than having a conflict resolution rule where the columns are renamed on the fly (or where one or the other gets dropped), because it's easier to track. Do you think this would be a good strategy?
@smmaurer Sounds good! And so that things are more explicit, I agree completely about variables in the model expression with new name (e.g. an alieas) rather than implementing a name-conflict-resolution rule.
@janowicz About part 1 of the initial comment, getting probabilities for a representative chooser..
No problems making the choicemodels.MultinomialLogitResults
object a persistent attribute of a fitted model step. That's taken care of.
But I'm not sure what the best approach is for getting probabilities across all the alternatives. This is what I had suggested initially:
alts = orca.get_table(m.alternatives).to_frame()
probs = m.model.probs(alts) # not allowed yet; requires a MergedChoiceTable
We really do need some kind of info about the choosers, though, because probabilities are relative to other alternatives in the same choice scenario. mnl_simulate()
asks for a numalts
and requires rows to be consecutive; probs()
requires instead that there be id columns for chooser and alternative.
So this would be a more realistic code snippet:
alts = orca.get_table(m.alternatives).to_frame()
obs = orca.get_table(m.choosers).to_frame().sample(n=1)
probs = m.model.probs(MergedChoiceTable(obs, alts))
Would it be reasonable to do something like that in your diagnostics code whenever you need these probabilities? I could also make some kind of probs_for_representative_chooser()
function to automate it -- but we'd want to accept arguments for custom filters, which chooser to use (if there are characteristics of the chooser in the model expression), and so on, and in the end it might not be much easier for the user than coding it directly.
The work outlined in this issue is now complete.
With help from @cvanoli, we tested the LCOG model code using the updated master branch of urbansim_templates
. After a couple of small fixes in PR #82 and #85, everything is now working.
This issue is to plan out integrating the changes from the
ej-test
branch into the main codebase, so that models using that code can more easily take advantage of other bug fixes and updates.I can do the implementation, with input from @janowicz! (cc @cvanoli)
Here's the code diff. The
ej-test
branch is based on v0.1.dev15.1. probas() method added to LargeMultinomialLogitStep
See large_multinomial_logit.py in the diff.
After a model is fit, this provides choice probabilities for a single representative chooser and the full list of alternatives. Looks like it won't work if there are attributes of the chooser in the model expression.
The same thing can be accomplished with built-in functionality now, i think, but to make it easier i'll retain the fitted MNL model as an attribute of the model step.
With a fitted step
m
, this code will currently give you a copy of the underlying MNL model:I'll update it so you can get the same thing like this:
Then you can get probabilities like this:
This won't quite work at the moment --
probs()
expects observation data as well as alternatives -- but it will be a quick fix.2. Choice column deleted from the observations table for MNL simulation
See large_multinomial_logit.py#L479 in the diff. This looks like a bug fix?
The MNL data-loading code was refactored in v0.1.dev21, and I don't think it pulls the choice column any more: large_multinomial_logit.py#L497-L501
3. In OLS simulation, predicted values are added as a new column
See regression.py in the diff.
Not sure whether this is to fix a problem with indexing, missing values, or the column not existing yet. @janowicz, let's discuss!
4. In the shared _get_df() method, table is filtered to drop non-relevant columns
See shared.py in the diff.
Same thing is implemented in 0.1.dev21.
5. Special case where buildings are choosers
See shared.py#L200-L202 in the diff. Let's discuss this one too, @janowicz!
Tasks
LargeMultinomialLogitStep.model
attribute that provides fitted model objectfixchoicemodels.MultinomialLogitResults.probs()
to accept df of alternatives onlyej-test
branch