UDST / udst-planning

UDST-wide issues and planning
1 stars 0 forks source link

Supporting Pandas v1.0 #1

Open smmaurer opened 4 years ago

smmaurer commented 4 years ago

These notes are from March – June 2020, originally compiled in Notion.

Background

Pandas v1.0 was released in Feb 2020. It removes a wide range of syntax that's been deprecated over the years, and will require minor updates across many of our codebases.

What's been removed: https://pandas.pydata.org/docs/whatsnew/v1.0.0.html#whatsnew-100-prior-deprecations

Strategies for restoring compatibility

If code raises errors in Pandas v1.0, try switching to Pandas v0.25. This should restore earlier functionality and also provide deprecation warnings describing what needs to be changed. (Versions prior to v0.25 may not include all the deprecation warnings.)

As a temporary fix, you can require pandas < 1.0 in the setup files of a library or project.

How risky are these changes?

Some of the compatibility fixes require judgment about what exactly the code is doing, but as far as I can tell if the updated code runs, it's very likely to do the same thing as the old code.

So I don't expect any of these changes to affect software logic -- but for codebases without unit tests we should be extra careful and try to do whatever testing is feasible.

What to update

Here are things that have come up for us so far:

  1. DataFrame.as_matrix() and Series.as_matrix() are removed

    These can be directly replaced with Series.values and DataFrame.values.

  2. DataFrame.ix[] and Series.ix[] are removed

    In most cases you can use .loc[] in its place, with identical arguments.

    • One exception is if .ix[] was being used for implicit positional indexing, which happens if the DataFrame or Series's index contains non-integer values but you pass integers to .ix[]. In this case, replacing it with .loc[] will raise an error and you should use .iloc[] instead.
    • Another exception is if an unlabeled list of index values is passed to .ix[], as described in the next section. Because .loc[] no longer supports this usage, you'll need to replace df.ix[list] with df.reindex(list).
    • Documentation of old .ix[] behavior
    • Discussion of removal
  3. DataFrame.loc[] and Series.loc[] no longer accept an unlabeled list of index values

    For example, to get the rows with ids 30, 50, and 40, you can no longer use df.loc[[30,50,40]]. Instead, replace this with df.reindex([30,50,40]). The behavior should be identical.

    (But i feel like this makes the code less readable, because it's not intuitive that "reindex" is going to yield a reordered subset of the DataFrame. If you have better ideas for what to replace this with, let me know!)

  4. DataFrame.get_value() and DataFrame.set_value() are removed

    And the equivalent for Series.

    You can replace df.set_value(row, col, val) with df.at[row, col] = val.

Support status

Which UDST libraries are currently compatible with Pandas v1.0?

Last updated Sep 30, 2020.

smmaurer commented 3 years ago

Another deprecation relevant to us in Pandas 1.2+: pandas.Index.to_native_types()

https://github.com/UDST/urbansim/issues/230