UDST / choicemodels

Python library for discrete choice modeling
https://udst.github.io/choicemodels
BSD 3-Clause "New" or "Revised" License
74 stars 33 forks source link

TypeError when fitting a MultinomialLogit model #12

Closed smmaurer closed 7 years ago

smmaurer commented 7 years ago

Arezoo (@arezoo-bz) is getting an unusual error trying to fit MultinomialLogit models. The same notebook runs on other machines without a problem. We'll update this issue when we find a solution or work-around.

screen shot 2017-07-26 at 10 56 09 screen shot 2017-07-26 at 10 56 19
gboeing commented 7 years ago

Is there a NaN somewhere that's being cast to int?

Arezoo-bz commented 7 years ago

@gboeing No NAN in the dataset. @smmaurer I updated anaconda and all other requirements... But still, I get the same error. I also deleted the %%time from my codes, but it didn't change anything. Weird problem!

waddell commented 7 years ago

I just replicated the error on my iMac. Conda is up to date, and I did a fresh install of pylogit, choicemodels, updated urbansim. Numpy 1.12.1

waddell commented 7 years ago

pandas is 0.20.1, and Python is 3.5.3. Followed the install instructions in choicemodels.

smmaurer commented 7 years ago

Interesting. How much of the destination choice notebook will run before triggering this error? Here's a shortcut to download the data files.

I'll try to replicate the error in a virtual environment so i can troubleshoot things.

waddell commented 7 years ago

It runs all the way to the estimation cell without error.

waddell commented 7 years ago

And it fails even if I drop all but one variable, regardless of which one remains.

smmaurer commented 7 years ago

Ok, I tracked down the cause and this should be fixed in the latest PR (#13). Full explanation below for the curious!

@Arezoo-bz, when you have a chance, do a git pull on choicemodels, restart the Jupyter kernel, and try running the code again. Let me know if it works.


The culprit was this division operator: https://github.com/UDST/choicemodels/pull/13/files

In Python 3, division returns a float even if both operands are integers (see PEP 238).

Numpy used to silently accept floats as indexes if they were "close" in value to ints, but as of v1.12 it now raises a TypeError (release notes).

So the combination of these things caused a crash on machines running Python 3 and Numpy > 1.12, but now it's fixed by using the correct integer division operator.

waddell commented 7 years ago

Confirmed, this fixed the problem. Nice detective work!