These notes are from March – June 2020, originally compiled in Notion.
Background
Pandas v1.0 was released in Feb 2020. It removes a wide range of syntax that's been deprecated over the years, and will require minor updates across many of our codebases.
If code raises errors in Pandas v1.0, try switching to Pandas v0.25. This should restore earlier functionality and also provide deprecation warnings describing what needs to be changed. (Versions prior to v0.25 may not include all the deprecation warnings.)
As a temporary fix, you can require pandas < 1.0 in the setup files of a library or project.
How risky are these changes?
Some of the compatibility fixes require judgment about what exactly the code is doing, but as far as I can tell if the updated code runs, it's very likely to do the same thing as the old code.
So I don't expect any of these changes to affect software logic -- but for codebases without unit tests we should be extra careful and try to do whatever testing is feasible.
What to update
Here are things that have come up for us so far:
DataFrame.as_matrix() and Series.as_matrix() are removed
These can be directly replaced with Series.values and DataFrame.values.
DataFrame.ix[] and Series.ix[] are removed
In most cases you can use .loc[] in its place, with identical arguments.
One exception is if .ix[] was being used for implicit positional indexing, which happens if the DataFrame or Series's index contains non-integer values but you pass integers to .ix[]. In this case, replacing it with .loc[] will raise an error and you should use .iloc[] instead.
Another exception is if an unlabeled list of index values is passed to .ix[], as described in the next section. Because .loc[] no longer supports this usage, you'll need to replace df.ix[list] with df.reindex(list).
DataFrame.loc[] and Series.loc[] no longer accept an unlabeled list of index values
For example, to get the rows with ids 30, 50, and 40, you can no longer use df.loc[[30,50,40]]. Instead, replace this with df.reindex([30,50,40]). The behavior should be identical.
(But i feel like this makes the code less readable, because it's not intuitive that "reindex" is going to yield a reordered subset of the DataFrame. If you have better ideas for what to replace this with, let me know!)
DataFrame.get_value() and DataFrame.set_value() are removed
And the equivalent for Series.
You can replace df.set_value(row, col, val) with df.at[row, col] = val.
These notes are from March – June 2020, originally compiled in Notion.
Background
Pandas v1.0 was released in Feb 2020. It removes a wide range of syntax that's been deprecated over the years, and will require minor updates across many of our codebases.
What's been removed: https://pandas.pydata.org/docs/whatsnew/v1.0.0.html#whatsnew-100-prior-deprecations
Strategies for restoring compatibility
If code raises errors in Pandas v1.0, try switching to Pandas v0.25. This should restore earlier functionality and also provide deprecation warnings describing what needs to be changed. (Versions prior to v0.25 may not include all the deprecation warnings.)
As a temporary fix, you can require
pandas < 1.0
in the setup files of a library or project.How risky are these changes?
Some of the compatibility fixes require judgment about what exactly the code is doing, but as far as I can tell if the updated code runs, it's very likely to do the same thing as the old code.
So I don't expect any of these changes to affect software logic -- but for codebases without unit tests we should be extra careful and try to do whatever testing is feasible.
What to update
Here are things that have come up for us so far:
DataFrame.as_matrix()
andSeries.as_matrix()
are removedThese can be directly replaced with
Series.values
andDataFrame.values
.DataFrame.ix[]
andSeries.ix[]
are removedIn most cases you can use
.loc[]
in its place, with identical arguments..ix[]
was being used for implicit positional indexing, which happens if the DataFrame or Series's index contains non-integer values but you pass integers to.ix[]
. In this case, replacing it with.loc[]
will raise an error and you should use.iloc[]
instead..ix[]
, as described in the next section. Because.loc[]
no longer supports this usage, you'll need to replacedf.ix[list]
withdf.reindex(list)
.DataFrame.loc[]
andSeries.loc[]
no longer accept an unlabeled list of index valuesFor example, to get the rows with ids 30, 50, and 40, you can no longer use
df.loc[[30,50,40]]
. Instead, replace this withdf.reindex([30,50,40])
. The behavior should be identical.(But i feel like this makes the code less readable, because it's not intuitive that "reindex" is going to yield a reordered subset of the DataFrame. If you have better ideas for what to replace this with, let me know!)
DataFrame.get_value()
andDataFrame.set_value()
are removedAnd the equivalent for Series.
You can replace
df.set_value(row, col, val)
withdf.at[row, col] = val
.Support status
Which UDST libraries are currently compatible with Pandas v1.0?
Last updated Sep 30, 2020.