Open stefancoe opened 1 year ago
Yeah, that's a bug. When it's used, zero-based encoding should get applied immediately on load (as soon as possible after pd.read_csv) to any file that has MAZ or TAZ ids in it. I will fix it.
@jpn-- Thank you!
@jpn-- Any update on this? Thanks!
Yes, I've found the fix in #643 is not robust across multiple use cases, including multiprocessing and large-value MAZ ID's. The problem also interacts with / is related to #652. I'm trying to get to a solution that works for all, but it looks like it will require duplicating the TAZ table, so one copy can be sliced for multiprocessing and the other not.
@jpn-- Thank you!
@jpn-- it looks like self.maz_ceiling can be either a pandas object or plain int. If the latter, the line below fails because the .astype method/function is not available.
https://github.com/camsys/activitysim/blob/generic-whale/activitysim/core/los.py#L313
@jpn-- For two-zone systems, it seems like self.maz_ceiling is always going to be an int: https://github.com/camsys/activitysim/blob/generic-whale/activitysim/core/los.py#L286
So self.maz_ceiling.astype() wont work.
I'm not sure, but I think the problem with max_ceiling
being a plain integer is a "sometimes" problem, maybe platform (win/mac/linux) related, or possibly due to differences in dependency versions (which could also be indirectly platform related).
In any case, I think this should be addressed by
https://github.com/camsys/activitysim/commit/4dfbbeb17a69c3adbcc03bdfb49aaba19398cfef which changes from x.astype(np.int64)
to np.int64(x)
, which should now evaluate correctly regardless if max_ceiling
is a plain Python integer or a numpy scalar.
Thanks! I see now- it does seem like it should work the way it was coded since .max should return a scalar. Anyhow, thanks for the fix.
https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.Series.max.html
This error can happen here when using MAZ IDs in the maz_to_maz tables that are higher than the maximum zero-based MAZ ID. This code is used to calculate the index of these files:
df["i"] = df.OMAZ * self.maz_ceiling + df.DMAZ
For example, our data has zone pairs 48-4903 & 55-997, with a max ceiling of 558, which makes df['i'] = 31687 in both cases.
The reason our MAZ labels are so much higher than the zero-based maximum is because I am using a subset of our region for a testing dataset. However, this still happens in our full sized data presumably because zones without emp/hhs are removed in the psrc_crop script. So we have IDs in maz_to_maz_walk file, for example, that are higher than the zero-based max.
It seems like the maz_to_maz tables need to get the zero-based treatment as well?