Open lisalan520 opened 9 years ago
I think (I'm not totally sure) that you need a building_id in the households table. Do you have that column? The vacant_residential_units requires there to be a building_id.
Thanks for replying! I have 'building_id' column in both my households & jobs data. When I did the step-by-step test, it has no problem in calculating the vacant_residential_units...
OK - yeah I've definitely seen this before but am having a hard time remembering the problem.
I would try 2 things - first, try naming the index of your buildings table -
buildings.index.name = 'building_id'
then I would double check the index for duplicates
pd.Series(buildings.index).value_counts()
and see if the top row has a value > 1.
I've checked my buildings data with the method you suggested. There is no duplicates in buildings.index. And I got same error message...
And you tried changing the name of the index?
On Tue, Jun 2, 2015 at 1:55 PM lisalan520 notifications@github.com wrote:
I've checked my buildings data with the method you suggested. There is no duplicates in buildings.index. And I got same error message...
— Reply to this email directly or view it on GitHub https://github.com/synthicity/sanfran_urbansim/issues/19#issuecomment-108096673 .
Yes I did. Could it be related to data type?
Is building_id a float because there are nans? If so, that is very likely it - is should be an int column.
The building_id is an int column. Should it be consecutive integers? I have something like this [1,2,3,5,9,10] will this be an issue?
Definitely does NOT have to consecutive.
Are you sure you're talking about the right table? The error is occurring here:
C:\Users\xzhang\Documents\PythonScripts\Marion_urbansim_test_0514_with_building_ids\utils.pyc in lcm_simulate(cfg, choosers, buildings, nodes, out_fname, supply_fname, vacant_fname)
198
199 # go from units back to buildings
--> 200 new_buildings = pd.Series(units.ix[new_units.values][out_fname].values,
201 index=new_units.index)
202
And I think the most likely way to get a KeyError there is if the units
DataFrame doesn't have a 'building_id'
column. Which table is units
?
Units comes from here:
https://github.com/synthicity/urbansim_defaults/blob/master/urbansim_defaults/utils.py#L358
It's an expansion of the original buildings table and it needs a building_id to get back to the buildings.
I really think the building_id comes from the call to .reset_index()
right there and that the index has to be named building_id
to get the building_id
column there. If the index is named building_id
, I'm not sure why it wouldn't have the column after that
From my step-by-step test, the unit table looks like this:
and it does has building_id ..
It looks like @lisalan520 is not using the same version of lcm_simulate
@fscottfoti linked to. Any reason to think that could be a problem?
It seems like it's a lot different - the line number has gone from 200 to 437. @lisalan520 what version are you using?
I don't know for sure, but it's definitely possible the new version would fix the problem. I have made some small changes in the function in the past 2-3 months. If we know what version @lisalan520 is running maybe we can diff them?
The lcm_simulate I used comes from here:
https://github.com/synthicity/sanfran_urbansim/blob/master/utils.py
I have the same code as in the link. I'm using UrbanSim 1.3. I'll try to update my urbansim to see whether it solves the problem.
Seems the two 2.0 versions both use discrete choice model, which should not solve the problem here. I'll try to run discrete choice model again and hope my computer can afford it this time. Many thanks!
So just to be clear, when you print out units
building_id is there, and we're looking at the expression units.loc[new_units.values][out_fname]
where out_fname is equal to building_id
so can you print out units.loc[new_units.values]
? - somehow building_id
is missing from the result? What is the expression equal to?
'units' is a dis-aggregated table of 'buildings' according to the vacant_units value. 'new_units' comes from lcm model predict. 'new_units.values' is used to pick rows from 'units' where 'units.index = new_units.values'
Here is a capture:
So can you then run units.loc[new_units.values][out_fname]
? What am I missing?
I was able to run units.loc[new_units.values]["building_id"] to get the results. But when I define out_fname= building_id, I cannot run units.loc[new_units.values][out_fname] here.
Can you print units.columns? Grasping at straws here...
Here it is:
Interesting - and you put building_id
in quotes above so that it's a string? Not sure what's going on here, but it's definitely a Pandas issue - there's no UrbanSim happening here that I can see.
Note that in the code @lisalan520 is using it's using .ix
, not .loc
. Wonder if that's making a difference.
I tried both .loc and .ix and they have the same problem with using building_id without quotes.
You're not going to be able to use building_id
without quotes, it has to be a string or a variable that refers to a string.
@lisalan520 You're not in SF, are you? I wish I could debug this in person. We might also be able to use a Google Hangout, I think I can drive your computer from those.
Sorry for some reason I thought it was building_id in my models.py. I just checked it and it was "building_id" when I got the error...
I'm in Indianapolis. I'll check if I can use Google Hangout on this computer. Thanks!
Hi @jiffyclub I think we can try Google Hangouts. So how do I connect with you?
You can join me here: https://plus.google.com/hangouts/_/gvttqyhgmmnclmstprvgwmrhwma
Just had a call with @lisalan520 and for some reason for her the expression
units = locations_df.loc[np.repeat(vacant_units.index.values,
vacant_units.values.astype('int'))].reset_index()
is resulting in the 'buildings_id'
label on locations_df.index
being dropped. She's using Pandas 0.14.1 and is going to try updating to 0.16.1 to see if that has been fixed (I suspect it has been fixed, since @fscottfoti hasn't run into the same problem).
Many thanks @jiffyclub !
I could only update my Pandas to 0.16.0 due to our firewall. The problem was still there. At this moment I don't think the error comes from pandas but I will continue to update it to 0.16.1.
Meanwhile, with the problem we've found, I changed out_fname to 'index' in the code:
new_buildings = pd.Series(units.loc[new_units.values]['index'].values, index=new_units.index)
and the model works fine after this change. Though I still don't understand why "building_id" turns into "index" in the loc() function..
But it seems the problem is solved for now. Thank you very much! I really appreciate your help!
So weird that locations_df.reset_index()
preserves the name, but locations_df.loc[].reset_index()
doesn't! But glad you have something working.
Hi,
I also have problem running 'hlcm_simulate' & 'elcm_simulate' models using my own data. It raised keyerror: 'building_id' for both models. I've checked my data and found nothing weird. I've also managed to break the model to individual steps and run them one by one. Do you have any idea what could be wrong? Thank you!
Here is my error message: