Supporting Pandas v1.0 - Githubissues

These notes are from March – June 2020, originally compiled in Notion.

Background

Pandas v1.0 was released in Feb 2020. It removes a wide range of syntax that's been deprecated over the years, and will require minor updates across many of our codebases.

What's been removed: https://pandas.pydata.org/docs/whatsnew/v1.0.0.html#whatsnew-100-prior-deprecations

Strategies for restoring compatibility

If code raises errors in Pandas v1.0, try switching to Pandas v0.25. This should restore earlier functionality and also provide deprecation warnings describing what needs to be changed. (Versions prior to v0.25 may not include all the deprecation warnings.)

As a temporary fix, you can require pandas < 1.0 in the setup files of a library or project.

How risky are these changes?

Some of the compatibility fixes require judgment about what exactly the code is doing, but as far as I can tell if the updated code runs, it's very likely to do the same thing as the old code.

So I don't expect any of these changes to affect software logic -- but for codebases without unit tests we should be extra careful and try to do whatever testing is feasible.

What to update

Here are things that have come up for us so far:

DataFrame.as_matrix() and Series.as_matrix() are removed

These can be directly replaced with Series.values and DataFrame.values.
DataFrame.ix[] and Series.ix[] are removed

In most cases you can use .loc[] in its place, with identical arguments.
- One exception is if .ix[] was being used for implicit positional indexing, which happens if the DataFrame or Series's index contains non-integer values but you pass integers to .ix[]. In this case, replacing it with .loc[] will raise an error and you should use .iloc[] instead.
- Another exception is if an unlabeled list of index values is passed to .ix[], as described in the next section. Because .loc[] no longer supports this usage, you'll need to replace df.ix[list] with df.reindex(list).
- Documentation of old .ix[] behavior
- Discussion of removal
DataFrame.loc[] and Series.loc[] no longer accept an unlabeled list of index values

For example, to get the rows with ids 30, 50, and 40, you can no longer use df.loc[[30,50,40]]. Instead, replace this with df.reindex([30,50,40]). The behavior should be identical.

(But i feel like this makes the code less readable, because it's not intuitive that "reindex" is going to yield a reordered subset of the DataFrame. If you have better ideas for what to replace this with, let me know!)
DataFrame.get_value() and DataFrame.set_value() are removed

And the equivalent for Series.

You can replace df.set_value(row, col, val) with df.at[row, col] = val.
- Discussion: https://github.com/pandas-dev/pandas/issues/15269#issuecomment-322571210

Support status

Which UDST libraries are currently compatible with Pandas v1.0?

Last updated Sep 30, 2020.

Choicemodels v0.2.2: compatible
Orca v1.5.3: compatible
Orca Test v0.1: compatible
OSMNet v0.1.5: compatible
Pandana v0.4.4: compatible
Spandex v0.1dev: not sure, tests are failing for other reasons
Synthpop v0.1.1: not sure, tests are failing
UrbanAccess v0.2: compatible
UrbanSim v3.2: compatible, but earlier versions are not (see https://github.com/UDST/urbansim/pull/222)
UrbanSim Defaults v0.2: not sure, still looking into it (no unit tests)
UrbanSim Templates v0.1.3: compatible

UDST / udst-planning

Supporting Pandas v1.0 #1

Background

Strategies for restoring compatibility

How risky are these changes?

What to update

Support status