This is a bugfix release for problems found in 1.13.0. The major changes are
fixes for the new memory overlap detection and temporary elision as well as
reversion of the removal of the boolean binary - operator. Users of 1.13.0
should upgrade.
Thr Python versions supported are 2.7 and 3.4 - 3.6. Note that the Python 3.6
wheels available from PIP are built against 3.6.1, hence will not work when
used with 3.6.0 due to Python bug 29943_. NumPy 1.13.2 will be released shortly
after Python 3.6.2 is out to fix that problem. If you are using 3.6.0 the
workaround is to upgrade to 3.6.1 or use an earlier Python version.
A total of 19 pull requests were merged for this release.
9240 DOC: BLD: fix lots of Sphinx warnings/errors.
9255 Revert "DEP: Raise TypeError for subtract(bool, bool)."
9261 BUG: don't elide into readonly and updateifcopy temporaries for...
9262 BUG: fix missing keyword rename for common block in numpy.f2py
9263 BUG: handle resize of 0d array
9267 DOC: update f2py front page and some doc build metadata.
9299 BUG: Fix Intel compilation on Unix.
9317 BUG: fix wrong ndim used in empty where check
9319 BUG: Make extensions compilable with MinGW on Py2.7
9339 BUG: Prevent crash if ufunc doc string is null
9340 BUG: umath: un-break ufunc where= when no out= is given
9371 DOC: Add isnat/positive ufunc to documentation
9372 BUG: Fix error in fromstring function from numpy.core.records...
9373 BUG: ')' is printed at the end pointer of the buffer in numpy.f2py.
9374 DOC: Create NumPy 1.13.1 release notes.
9376 BUG: Prevent hang traversing ufunc userloop linked list
9377 DOC: Use x1 and x2 in the heaviside docstring.
9378 DOC: Add $PARAMS to the isnat docstring
9379 DOC: Update the 1.13.1 release notes
Contributors
A total of 12 people contributed to this release. People with a "+" by their
names contributed a patch for the first time.
Andras Deak +
Bob Eldering +
Charles Harris
Daniel Hrisca +
Eric Wieser
Joshua Leahy +
Julian Taylor
Michael Seifert
Pauli Virtanen
Ralf Gommers
Roland Kaufmann
Warren Weckesser
=========================
1.13.0
==========================
This release supports Python 2.7 and 3.4 - 3.6.
Highlights
Operations like a + b + c will reuse temporaries on some platforms,
resulting in less memory use and faster execution.
Inplace operations check if inputs overlap outputs and create temporaries
to avoid problems.
New __array_ufunc__ attribute provides improved ability for classes to
override default ufunc behavior.
New np.block function for creating blocked arrays.
New functions
New np.positive ufunc.
New np.divmod ufunc provides more efficient divmod.
New np.isnat ufunc tests for NaT special values.
New np.heaviside ufunc computes the Heaviside function.
New np.isin function, improves on in1d.
New np.block function for creating blocked arrays.
New PyArray_MapIterArrayCopyIfOverlap added to NumPy C-API.
See below for details.
Deprecations
Calling np.fix, np.isposinf, and np.isneginf with f(x, y=out)
is deprecated - the argument should be passed as f(x, out=out), which
matches other ufunc-like interfaces.
Use of the C-API NPY_CHAR type number deprecated since version 1.7 will
now raise deprecation warnings at runtime. Extensions built with older f2py
versions need to be recompiled to remove the warning.
np.ma.argsort, np.ma.minimum.reduce, and np.ma.maximum.reduce
should be called with an explicit axis argument when applied to arrays with
more than 2 dimensions, as the default value of this argument (None) is
inconsistent with the rest of numpy (-1, 0, and 0, respectively).
np.ma.MaskedArray.mini is deprecated, as it almost duplicates the
functionality of np.MaskedArray.min. Exactly equivalent behaviour
can be obtained with np.ma.minimum.reduce.
The single-argument form of np.ma.minimum and np.ma.maximum is
deprecated. np.maximum. np.ma.minimum(x) should now be spelt
np.ma.minimum.reduce(x), which is consistent with how this would be done
with np.minimum.
Calling ndarray.conjugate on non-numeric dtypes is deprecated (it
should match the behavior of np.conjugate, which throws an error).
Calling expand_dims when the axis keyword does not satisfy
-a.ndim - 1 <= axis <= a.ndim, where a is the array being reshaped,
is deprecated.
Future Changes
Assignment between structured arrays with different field names will change
in NumPy 1.14. Previously, fields in the dst would be set to the value of the
identically-named field in the src. In numpy 1.14 fields will instead be
assigned 'by position': The n-th field of the dst will be set to the n-th
field of the src array. Note that the FutureWarning raised in NumPy 1.12
incorrectly reported this change as scheduled for NumPy 1.13 rather than
NumPy 1.14.
Build System Changes
numpy.distutils now automatically determines C-file dependencies with
GCC compatible compilers.
Compatibility notes
Error type changes
numpy.hstack() now throws ValueError instead of IndexError when
input is empty.
Functions taking an axis argument, when that argument is out of range, now
throw np.AxisError instead of a mixture of IndexError and
ValueError. For backwards compatibility, AxisError subclasses both of
these.
Tuple object dtypes
Support has been removed for certain obscure dtypes that were unintentionally
allowed, of the form (old_dtype, new_dtype), where either of the dtypes
is or contains the object dtype. As an exception, dtypes of the form
(object, [('name', object)]) are still supported due to evidence of
existing use.
DeprecationWarning to error
See Changes section for more detail.
partition, TypeError when non-integer partition index is used.
NpyIter_AdvancedNew, ValueError when oa_ndim == 0 and op_axes is NULL
negative(bool_), TypeError when negative applied to booleans.
subtract(bool_, bool_), TypeError when subtracting boolean from boolean.
Previously bool(dtype) would fall back to the default python
implementation, which checked if len(dtype) > 0. Since dtype objects
implement __len__ as the number of record fields, bool of scalar dtypes
would evaluate to False, which was unintuitive. Now bool(dtype) == True
for all dtypes.
__getslice__ and __setslice__ are no longer needed in ndarray subclasses
When subclassing np.ndarray in Python 2.7, it is no longer necessary to
implement __*slice__ on the derived class, as __*item__ will intercept
these calls correctly.
Any code that did implement these will work exactly as before. Code that
invokesndarray.__getslice__ (e.g. through super(...).__getslice__) will
now issue a DeprecationWarning - .__getitem__(slice(start, end)) should be
used instead.
Indexing MaskedArrays/Constants with ... (ellipsis) now returns MaskedArray
This behavior mirrors that of np.ndarray, and accounts for nested arrays in
MaskedArrays of object dtype, and ellipsis combined with other forms of
indexing.
C API changes
GUfuncs on empty arrays and NpyIter axis removal
It is now allowed to remove a zero-sized axis from NpyIter. Which may mean
that code removing axes from NpyIter has to add an additional check when
accessing the removed dimensions later on.
The largest followup change is that gufuncs are now allowed to have zero-sized
inner dimensions. This means that a gufunc now has to anticipate an empty inner
dimension, while this was never possible and an error raised instead.
For most gufuncs no change should be necessary. However, it is now possible
for gufuncs with a signature such as (..., N, M) -> (..., M) to return
a valid result if N=0 without further wrapping code.
PyArray_MapIterArrayCopyIfOverlap added to NumPy C-API
Similar to PyArray_MapIterArray but with an additional copy_if_overlap
argument. If copy_if_overlap != 0, checks if input has memory overlap with
any of the other arrays and make copies as appropriate to avoid problems if the
input is modified during the iteration. See the documentation for more complete
documentation.
New Features
__array_ufunc__ added
This is the renamed and redesigned __numpy_ufunc__. Any class, ndarray
subclass or not, can define this method or set it to None in order to
override the behavior of NumPy's ufuncs. This works quite similarly to Python's
__mul__ and other binary operation routines. See the documentation for a
more detailed description of the implementation and behavior of this new
option. The API is provisional, we do not yet guarantee backward compatibility
as modifications may be made pending feedback. See the NEP and
documentation for more details.
This ufunc corresponds to unary +, but unlike + on an ndarray it will raise
an error if array values do not support numeric operations.
New divmod ufunc
This ufunc corresponds to the Python builtin divmod, and is used to implement
divmod when called on numpy arrays. np.divmod(x, y) calculates a result
equivalent to (np.floor_divide(x, y), np.remainder(x, y)) but is
approximately twice as fast as calling the functions separately.
np.isnat ufunc tests for NaT special datetime and timedelta values
The new ufunc np.isnat finds the positions of special NaT values
within datetime and timedelta arrays. This is analogous to np.isnan.
np.heaviside ufunc computes the Heaviside function
The new function np.heaviside(x, h0) (a ufunc) computes the Heaviside
function:
.. code::
{ 0 if x < 0,
heaviside(x, h0) = { h0 if x == 0,
{ 1 if x > 0.
np.block function for creating blocked arrays
Add a new block function to the current stacking functions vstack,
hstack, and stack. This allows concatenation across multiple axes
simultaneously, with a similar syntax to array creation, but where elements
can themselves be arrays. For instance::
While primarily useful for block matrices, this works for arbitrary dimensions
of arrays.
It is similar to Matlab's square bracket notation for creating block matrices.
isin function, improving on in1d
The new function isin tests whether each element of an N-dimensonal
array is present anywhere within a second array. It is an enhancement
of in1d that preserves the shape of the first array.
Temporary elision
On platforms providing the backtrace function NumPy will try to avoid
creating temporaries in expression involving basic numeric types.
For example d = a + b + c is transformed to d = a + b; d += c which can
improve performance for large arrays as less memory bandwidth is required to
perform the operation.
axes argument for unique
In an N-dimensional array, the user can now choose the axis along which to look
for duplicate N-1-dimensional elements using numpy.unique. The original
behaviour is recovered if axis=None (default).
np.gradient now supports unevenly spaced data
Users can now specify a not-constant spacing for data.
In particular np.gradient can now take:
A single scalar to specify a sample distance for all dimensions.
N scalars to specify a constant sample distance for each dimension.
i.e. dx, dy, dz, ...
N arrays to specify the coordinates of the values along each dimension of F.
The length of the array must match the size of the corresponding dimension
Any combination of N scalars/arrays with the meaning of 2. and 3.
This means that, e.g., it is now possible to do the following::
Support for returning arrays of arbitrary dimensions in apply_along_axis
Previously, only scalars or 1D arrays could be returned by the function passed
to apply_along_axis. Now, it can return an array of any dimensionality
(including 0D), and the shape of this array replaces the axis of the array
being iterated over.
.ndim property added to dtype to complement .shape
For consistency with ndarray and broadcast, d.ndim is a shorthand
for len(d.shape).
Support for tracemalloc in Python 3.6
NumPy now supports memory tracing with tracemalloc_ module of Python 3.6 or
newer. Memory allocations from NumPy are placed into the domain defined by
numpy.lib.tracemalloc_domain.
Note that NumPy allocation will not show up in tracemalloc_ of earlier Python
versions.
NumPy may be built with relaxed stride checking debugging
Setting NPY_RELAXED_STRIDES_DEBUG=1 in the environment when relaxed stride
checking is enabled will cause NumPy to be compiled with the affected strides
set to the maximum value of npy_intp in order to help detect invalid usage of
the strides in downstream projects. When enabled, invalid usage often results
in an error being raised, but the exact type of error depends on the details of
the code. TypeError and OverflowError have been observed in the wild.
It was previously the case that this option was disabled for releases and
enabled in master and changing between the two required editing the code. It is
now disabled by default but can be enabled for test builds.
Improvements
Ufunc behavior for overlapping inputs
Operations where ufunc input and output operands have memory overlap
produced undefined results in previous NumPy versions, due to data
dependency issues. In NumPy 1.13.0, results from such operations are
now defined to be the same as for equivalent operations where there is
no memory overlap.
Operations affected now make temporary copies, as needed to eliminate
data dependency. As detecting these cases is computationally
expensive, a heuristic is used, which may in rare cases result to
needless temporary copies. For operations where the data dependency
is simple enough for the heuristic to analyze, temporary copies will
not be made even if the arrays overlap, if it can be deduced copies
are not necessary. As an example,np.add(a, b, out=a) will not
involve copies.
To illustrate a previously undefined operation::
>>> x = np.arange(16).astype(float)
>>> np.add(x[1:], x[:-1], out=x[1:])
In NumPy 1.13.0 the last line is guaranteed to be equivalent to::
A similar operation with simple non-problematic data dependence is::
>>> x = np.arange(16).astype(float)
>>> np.add(x[1:], x[:-1], out=x[:-1])
It will continue to produce the same results as in previous NumPy
versions, and will not involve unnecessary temporary copies.
The change applies also to in-place binary operations, for example::
>>> x = np.random.rand(500, 500)
>>> x += x.T
This statement is now guaranteed to be equivalent to x[...] = x + x.T,
whereas in previous NumPy versions the results were undefined.
Partial support for 64-bit f2py extensions with MinGW
Extensions that incorporate Fortran libraries can now be built using the free
MinGW toolset, also under Python 3.5. This works best for extensions that only
do calculations and uses the runtime modestly (reading and writing from files,
for instance). Note that this does not remove the need for Mingwpy; if you make
extensive use of the runtime, you will most likely run into issues. Instead,
it should be regarded as a band-aid until Mingwpy is fully functional.
Extensions can also be compiled using the MinGW toolset using the runtime
library from the (moveable) WinPython 3.4 distribution, which can be useful for
programs with a PySide1/Qt4 front-end.
Performance improvements for packbits and unpackbits
The functions numpy.packbits with boolean input and numpy.unpackbits have
been optimized to be a significantly faster for contiguous data.
Fix for PPC long double floating point information
In previous versions of NumPy, the finfo function returned invalid
information about the double double_ format of the longdouble float type
on Power PC (PPC). The invalid values resulted from the failure of the NumPy
algorithm to deal with the variable number of digits in the significand
that are a feature of PPC long doubles. This release by-passes the failing
algorithm by using heuristics to detect the presence of the PPC double double
format. A side-effect of using these heuristics is that the finfo
function is faster than previous releases.
Subclasses of ndarray with no repr specialization now correctly indent
their data and type lines.
More reliable comparisons of masked arrays
Comparisons of masked arrays were buggy for masked scalars and failed for
structured arrays with dimension higher than one. Both problems are now
solved. In the process, it was ensured that in getting the result for a
structured array, masked fields are properly ignored, i.e., the result is equal
if all fields that are non-masked in both are equal, thus making the behaviour
identical to what one gets by comparing an unstructured masked array and then
doing .all() over some axis.
np.matrix with booleans elements can now be created using the string syntax
np.matrix failed whenever one attempts to use it with booleans, e.g.,
np.matrix('True'). Now, this works as expected.
More linalg operations now accept empty vectors and matrices
All of the following functions in np.linalg now work when given input
arrays with a 0 in the last two dimensions: det, slogdet, pinv,
eigvals, eigvalsh, eig, eigh.
Bundled version of LAPACK is now 3.2.2
NumPy comes bundled with a minimal implementation of lapack for systems without
a lapack library installed, under the name of lapack_lite. This has been
upgraded from LAPACK 3.0.0 (June 30, 1999) to LAPACK 3.2.2 (June 30, 2010). See
the LAPACK changelogs_ for details on the all the changes this entails.
While no new features are exposed through numpy, this fixes some bugs
regarding "workspace" sizes, and in some places may use faster algorithms.
reduce of np.hypot.reduce and np.logical_xor allowed in more cases
This now works on empty arrays, returning 0, and can reduce over multiple axes.
Previously, a ValueError was thrown in these cases.
Better repr of object arrays
Object arrays that contain themselves no longer cause a recursion error.
Object arrays that contain list objects are now printed in a way that makes
clear the difference between a 2d object array, and a 1d object array of lists.
Changes
argsort on masked arrays takes the same default arguments as sort
By default, argsort now places the masked values at the end of the sorted
array, in the same way that sort already did. Additionally, the
end_with argument is added to argsort, for consistency with sort.
Note that this argument is not added at the end, so breaks any code that
passed fill_value as a positional argument.
average now preserves subclasses
For ndarray subclasses, numpy.average will now return an instance of the
subclass, matching the behavior of most other NumPy functions such as mean.
As a consequence, also calls that returned a scalar may now return a subclass
array scalar.
array == None and array != None do element-wise comparison
Previously these operations returned scalars False and True respectively.
np.equal, np.not_equal for object arrays ignores object identity
Previously, these functions always treated identical objects as equal. This had
the effect of overriding comparison failures, comparison of objects that did
not return booleans, such as np.arrays, and comparison of objects where the
results differed from object identity, such as NaNs.
Boolean indexing changes
Boolean array-likes (such as lists of python bools) are always treated as
boolean indexes.
Boolean scalars (including python True) are legal boolean indexes and
never treated as integers.
Boolean indexes must match the dimension of the axis that they index.
Boolean indexes used on the lhs of an assignment must match the dimensions of
the rhs.
Boolean indexing into scalar arrays return a new 1-d array. This means that
array(1)[array(True)] gives array([1]) and not the original array.
np.random.multivariate_normal behavior with bad covariance matrix
It is now possible to adjust the behavior the function will have when dealing
with the covariance matrix by using two new keyword arguments:
tol can be used to specify a tolerance to use when checking that
the covariance matrix is positive semidefinite.
check_valid can be used to configure what the function will do in the
presence of a matrix that is not positive semidefinite. Valid options are
ignore, warn and raise. The default value, warn keeps the
the behavior used on previous releases.
assert_array_less compares np.inf and -np.inf now
Previously, np.testing.assert_array_less ignored all infinite values. This
is not the expected behavior both according to documentation and intuitively.
Now, -inf < x < inf is considered True for any real number x and all
other cases fail.
assert_array_ and masked arrays assert_equal hide less warnings
Some warnings that were previously hidden by the assert_array_
functions are not hidden anymore. In most cases the warnings should be
correct and, should they occur, will require changes to the tests using
these functions.
For the masked array assert_equal version, warnings may occur when
comparing NaT. The function presently does not handle NaT or NaN
specifically and it may be best to avoid it at this time should a warning
show up due to this change.
offset attribute value in memmap objects
The offset attribute in a memmap object is now set to the
offset into the file. This is a behaviour change only for offsets
greater than mmap.ALLOCATIONGRANULARITY.
np.real and np.imag return scalars for scalar inputs
Previously, np.real and np.imag used to return array objects when
provided a scalar input, which was inconsistent with other functions like
np.angle and np.conj.
The polynomial convenience classes cannot be passed to ufuncs
The ABCPolyBase class, from which the convenience classes are derived, sets
__array_ufun__ = None in order of opt out of ufuncs. If a polynomial
convenience class instance is passed as an argument to a ufunc, a TypeError
will now be raised.
Output arguments to ufuncs can be tuples also for ufunc methods
For calls to ufuncs, it was already possible, and recommended, to use an
out argument with a tuple for ufuncs with multiple outputs. This has now
been extended to output arguments in the reduce, accumulate, and
reduceat methods. This is mostly for compatibility with __array_ufunc;
there are no ufuncs yet that have more than one output.
==========================
pandas 0.20.1 -> 0.21.0
0.21.0
This is a major release from 0.20.3 and includes a number of API changes, deprecations, new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.
Highlights include:
Integration with Apache Parquet <https://parquet.apache.org/>__, including a new top-level :func:read_parquet function and :meth:DataFrame.to_parquet method, see :ref:here <whatsnew_0210.enhancements.parquet>.
New user-facing :class:pandas.api.types.CategoricalDtype for specifying
categoricals independent of the data, see :ref:here <whatsnew_0210.enhancements.categorical_dtype>.
The behavior of sum and prod on all-NaN Series/DataFrames is now consistent and no longer depends on whether bottleneck <http://berkeleyanalytics.com/bottleneck>__ is installed, see :ref:here <whatsnew_0210.api_breaking.bottleneck>.
Compatibility fixes for pypy, see :ref:here <whatsnew_0210.pypy>.
Additions to the drop, reindex and rename API to make them more consistent, see :ref:here <whatsnew_0210.enhancements.drop_api>.
Addition of the new methods DataFrame.infer_objects (see :ref:here <whatsnew_0210.enhancements.infer_objects>) and GroupBy.pipe (see :ref:here <whatsnew_0210.enhancements.GroupBy_pipe>).
Indexing with a list of labels, where one or more of the labels is missing, is deprecated and will raise a KeyError in a future version, see :ref:here <whatsnew_0210.api_breaking.loc>.
Check the :ref:API Changes <whatsnew_0210.api_breaking> and :ref:deprecations <whatsnew_0210.deprecations> before updating.
.. contents:: What's new in v0.21.0
:local:
:backlinks: none
:depth: 2
.. _whatsnew_0210.enhancements:
New features
.. _whatsnew_0210.enhancements.parquet:
Integration with Apache Parquet file format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Integration with `Apache Parquet <https://parquet.apache.org/>`__, including a new top-level :func:`read_parquet` and :func:`DataFrame.to_parquet` method, see :ref:`here <io.parquet>` (:issue:`15838`, :issue:`17438`).
`Apache Parquet <https://parquet.apache.org/>`__ provides a cross-language, binary file format for reading and writing data frames efficiently.
Parquet is designed to faithfully serialize and de-serialize ``DataFrame`` s, supporting all of the pandas
dtypes, including extension dtypes such as datetime with timezones.
This functionality depends on either the `pyarrow <http://arrow.apache.org/docs/python/>`__ or `fastparquet <https://fastparquet.readthedocs.io/en/latest/>`__ library.
For more details, see see :ref:`the IO docs on Parquet <io.parquet>`.
.. _whatsnew_0210.enhancements.infer_objects:
``infer_objects`` type conversion
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :meth:`DataFrame.infer_objects` and :meth:`Series.infer_objects`
methods have been added to perform dtype inference on object columns, replacing
some of the functionality of the deprecated ``convert_objects``
method. See the documentation :ref:`here <basics.object_conversion>`
for more details. (:issue:`11221`)
This method only performs soft conversions on object columns, converting Python objects
to native types, but not any coercive conversions. For example:
.. ipython:: python
df = pd.DataFrame({'A': [1, 2, 3],
'B': np.array([1, 2, 3], dtype='object'),
'C': ['1', '2', '3']})
df.dtypes
df.infer_objects().dtypes
Note that column ``'C'`` was not converted - only scalar numeric types
will be converted to a new type. Other types of conversion should be accomplished
using the :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedelta`).
.. ipython:: python
df = df.infer_objects()
df['C'] = pd.to_numeric(df['C'], errors='coerce')
df.dtypes
.. _whatsnew_0210.enhancements.attribute_access:
Improved warnings when attempting to create columns
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
New users are often puzzled by the relationship between column operations and
attribute access on ``DataFrame`` instances (:issue:`7175`). One specific
instance of this confusion is attempting to create a new column by setting an
attribute on the ``DataFrame``:
.. code-block:: ipython
In[1]: df = pd.DataFrame({'one': [1., 2., 3.]})
In[2]: df.two = [4, 5, 6]
This does not raise any obvious exceptions, but also does not create a new column:
.. code-block:: ipython
In[3]: df
Out[3]:
one
0 1.0
1 2.0
2 3.0
Setting a list-like data structure into a new attribute now raises a ``UserWarning`` about the potential for unexpected behavior. See :ref:`Attribute Access <indexing.attribute_access>`.
.. _whatsnew_0210.enhancements.drop_api:
``drop`` now also accepts index/columns keywords
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :meth:`~DataFrame.drop` method has gained ``index``/``columns`` keywords as an
alternative to specifying the ``axis``. This is similar to the behavior of ``reindex``
(:issue:`12392`).
For example:
.. ipython:: python
df = pd.DataFrame(np.arange(8).reshape(2,4),
columns=['A', 'B', 'C', 'D'])
df
df.drop(['B', 'C'], axis=1)
the following is now equivalent
df.drop(columns=['B', 'C'])
.. _whatsnew_0210.enhancements.rename_reindex_axis:
``rename``, ``reindex`` now also accept axis keyword
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :meth:`DataFrame.rename` and :meth:`DataFrame.reindex` methods have gained
the ``axis`` keyword to specify the axis to target with the operation
(:issue:`12392`).
Here's ``rename``:
.. ipython:: python
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df.rename(str.lower, axis='columns')
df.rename(id, axis='index')
And ``reindex``:
.. ipython:: python
df.reindex(['A', 'B', 'C'], axis='columns')
df.reindex([0, 1, 3], axis='index')
The "index, columns" style continues to work as before.
.. ipython:: python
df.rename(index=id, columns=str.lower)
df.reindex(index=[0, 1, 3], columns=['A', 'B', 'C'])
We *highly* encourage using named arguments to avoid confusion when using either
style.
.. _whatsnew_0210.enhancements.categorical_dtype:
``CategoricalDtype`` for specifying categoricals
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:class:`pandas.api.types.CategoricalDtype` has been added to the public API and
expanded to include the ``categories`` and ``ordered`` attributes. A
``CategoricalDtype`` can be used to specify the set of categories and
orderedness of an array, independent of the data. This can be useful for example,
when converting string data to a ``Categorical`` (:issue:`14711`,
:issue:`15078`, :issue:`16015`, :issue:`17643`):
.. ipython:: python
from pandas.api.types import CategoricalDtype
s = pd.Series(['a', 'b', 'c', 'a']) strings
dtype = CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True)
s.astype(dtype)
One place that deserves special mention is in :meth:`read_csv`. Previously, with
``dtype={'col': 'category'}``, the returned values and categories would always
be strings.
.. ipython:: python
:suppress:
from pandas.compat import StringIO
.. ipython:: python
data = 'A,B\na,1\nb,2\nc,3'
pd.read_csv(StringIO(data), dtype={'B': 'category'}).B.cat.categories
Notice the "object" dtype.
With a ``CategoricalDtype`` of all numerics, datetimes, or
timedeltas, we can automatically convert to the correct type
.. ipython:: python
dtype = {'B': CategoricalDtype([1, 2, 3])}
pd.read_csv(StringIO(data), dtype=dtype).B.cat.categories
The values have been correctly interpreted as integers.
The ``.dtype`` property of a ``Categorical``, ``CategoricalIndex`` or a
``Series`` with categorical type will now return an instance of
``CategoricalDtype``. While the repr has changed, ``str(CategoricalDtype())`` is
still the string ``'category'``. We'll take this moment to remind users that the
*preferred* way to detect categorical data is to use
:func:`pandas.api.types.is_categorical_dtype`, and not ``str(dtype) == 'category'``.
See the :ref:`CategoricalDtype docs <categorical.categoricaldtype>` for more.
.. _whatsnew_0210.enhancements.GroupBy_pipe:
``GroupBy`` objects now have a ``pipe`` method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``GroupBy`` objects now have a ``pipe`` method, similar to the one on
``DataFrame`` and ``Series``, that allow for functions that take a
``GroupBy`` to be composed in a clean, readable syntax. (:issue:`17871`)
For a concrete example on combining ``.groupby`` and ``.pipe`` , imagine having a
DataFrame with columns for stores, products, revenue and sold quantity. We'd like to
do a groupwise calculation of *prices* (i.e. revenue/quantity) per store and per product.
We could do this in a multi-step operation, but expressing it in terms of piping can make the
code more readable.
First we set the data:
.. ipython:: python
import numpy as np
n = 1000
df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n),
'Revenue': (np.random.random(n)*50+10).round(2),
'Quantity': np.random.randint(1, 10, size=n)})
df.head(2)
Now, to find prices per store/product, we can simply do:
.. ipython:: python
(df.groupby(['Store', 'Product'])
.pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())
.unstack().round(2))
See the :ref:`documentation <groupby.pipe>` for more.
.. _whatsnew_0210.enhancements.reanme_categories:
``Categorical.rename_categories`` accepts a dict-like
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:meth:`~Series.cat.rename_categories` now accepts a dict-like argument for
``new_categories``. The previous categories are looked up in the dictionary's
keys and replaced if found. The behavior of missing and extra keys is the same
as in :meth:`DataFrame.rename`.
.. ipython:: python
c = pd.Categorical(['a', 'a', 'b'])
c.rename_categories({"a": "eh", "b": "bee"})
.. warning::
To assist with upgrading pandas, ``rename_categories`` treats ``Series`` as
list-like. Typically, Series are considered to be dict-like (e.g. in
``.rename``, ``.map``). In a future version of pandas ``rename_categories``
will change to treat them as dict-like. Follow the warning message's
recommendations for writing future-proof code.
.. code-block:: ipython
In [33]: c.rename_categories(pd.Series([0, 1], index=['a', 'c']))
FutureWarning: Treating Series 'new_categories' as a list-like and using the values.
In a future version, 'rename_categories' will treat Series like a dictionary.
For dict-like, use 'new_categories.to_dict()'
For list-like, use 'new_categories.values'.
Out[33]:
[0, 0, 1]
Categories (2, int64): [0, 1]
.. _whatsnew_0210.enhancements.other:
Other Enhancements
^^^^^^^^^^^^^^^^^^
New functions or methods
""""""""""""""""""""""""
- :meth:`~pandas.core.resample.Resampler.nearest` is added to support nearest-neighbor upsampling (:issue:`17496`).
- :class:`~pandas.Index` has added support for a ``to_frame`` method (:issue:`15230`).
New keywords
""""""""""""
- Added a ``skipna`` parameter to :func:`~pandas.api.types.infer_dtype` to
support type inference in the presence of missing values (:issue:`17059`).
- :func:`Series.to_dict` and :func:`DataFrame.to_dict` now support an ``into`` keyword which allows you to specify the ``collections.Mapping`` subclass that you would like returned. The default is ``dict``, which is backwards compatible. (:issue:`16122`)
- :func:`Series.set_axis` and :func:`DataFrame.set_axis` now support the ``inplace`` parameter. (:issue:`14636`)
- :func:`Series.to_pickle` and :func:`DataFrame.to_pickle` have gained a ``protocol`` parameter (:issue:`16252`). By default, this parameter is set to `HIGHEST_PROTOCOL <https://docs.python.org/3/library/pickle.htmldata-stream-format>`__
- :func:`read_feather` has gained the ``nthreads`` parameter for multi-threaded operations (:issue:`16359`)
- :func:`DataFrame.clip()` and :func:`Series.clip()` have gained an ``inplace`` argument. (:issue:`15388`)
- :func:`crosstab` has gained a ``margins_name`` parameter to define the name of the row / column that will contain the totals when ``margins=True``. (:issue:`15972`)
- :func:`read_json` now accepts a ``chunksize`` parameter that can be used when ``lines=True``. If ``chunksize`` is passed, read_json now returns an iterator which reads in ``chunksize`` lines with each iteration. (:issue:`17048`)
- :func:`read_json` and :func:`~DataFrame.to_json` now accept a ``compression`` argument which allows them to transparently handle compressed files. (:issue:`17798`)
Various enhancements
""""""""""""""""""""
- Improved the import time of pandas by about 2.25x. (:issue:`16764`)
- Support for `PEP 519 -- Adding a file system path protocol
<https://www.python.org/dev/peps/pep-0519/>`_ on most readers (e.g.
:func:`read_csv`) and writers (e.g. :meth:`DataFrame.to_csv`) (:issue:`13823`).
- Added a ``__fspath__`` method to ``pd.HDFStore``, ``pd.ExcelFile``,
and ``pd.ExcelWriter`` to work properly with the file system path protocol (:issue:`13823`).
- The ``validate`` argument for :func:`merge` now checks whether a merge is one-to-one, one-to-many, many-to-one, or many-to-many. If a merge is found to not be an example of specified merge type, an exception of type ``MergeError`` will be raised. For more, see :ref:`here <merging.validation>` (:issue:`16270`)
- Added support for `PEP 518 <https://www.python.org/dev/peps/pep-0518/>`_ (``pyproject.toml``) to the build system (:issue:`16745`)
- :func:`RangeIndex.append` now returns a ``RangeIndex`` object when possible (:issue:`16212`)
- :func:`Series.rename_axis` and :func:`DataFrame.rename_axis` with ``inplace=True`` now return ``None`` while renaming the axis inplace. (:issue:`15704`)
- :func:`api.types.infer_dtype` now infers decimals. (:issue:`15690`)
- :func:`DataFrame.select_dtypes` now accepts scalar values for include/exclude as well as list-like. (:issue:`16855`)
- :func:`date_range` now accepts 'YS' in addition to 'AS' as an alias for start of year. (:issue:`9313`)
- :func:`date_range` now accepts 'Y' in addition to 'A' as an alias for end of year. (:issue:`9313`)
- :func:`DataFrame.add_prefix` and :func:`DataFrame.add_suffix` now accept strings containing the '%' character. (:issue:`17151`)
- Read/write methods that infer compression (:func:`read_csv`, :func:`read_table`, :func:`read_pickle`, and :meth:`~DataFrame.to_pickle`) can now infer from path-like objects, such as ``pathlib.Path``. (:issue:`17206`)
- :func:`read_sas` now recognizes much more of the most frequently used date (datetime) formats in SAS7BDAT files. (:issue:`15871`)
- :func:`DataFrame.items` and :func:`Series.items` are now present in both Python 2 and 3 and is lazy in all cases. (:issue:`13918`, :issue:`17213`)
- :meth:`pandas.io.formats.style.Styler.where` has been implemented as a convenience for :meth:`pandas.io.formats.style.Styler.applymap`. (:issue:`17474`)
- :func:`MultiIndex.is_monotonic_decreasing` has been implemented. Previously returned ``False`` in all cases. (:issue:`16554`)
- :func:`read_excel` raises ``ImportError`` with a better message if ``xlrd`` is not installed. (:issue:`17613`)
- :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names. (:issue:`14207`)
- :func:`Series.reindex`, :func:`DataFrame.reindex`, :func:`Index.get_indexer` now support list-like argument for ``tolerance``. (:issue:`17367`)
.. _whatsnew_0210.api_breaking:
Backwards incompatible API changes
.. _whatsnew_0210.api_breaking.deps:
Dependencies have increased minimum versions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We have updated our minimum supported versions of dependencies (:issue:15206, :issue:15543, :issue:15214).
If installed, we now require:
Additionally, support has been dropped for Python 3.4 (:issue:15251).
.. _whatsnew_0210.api_breaking.bottleneck:
Sum/Prod of all-NaN Series/DataFrames is now consistently NaN
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The behavior of sum and prod on all-NaN Series/DataFrames no longer depends on
whether bottleneck <http://berkeleyanalytics.com/bottleneck>__ is installed. (:issue:9422, :issue:15507).
Calling sum or prod on an empty or all-NaNSeries, or columns of a DataFrame, will result in NaN. See the :ref:docs <missing_data.numeric_sum>.
.. ipython:: python
s = Series([np.nan])
Previously NO bottleneck
.. code-block:: ipython
In [2]: s.sum()
Out[2]: np.nan
Previously WITH bottleneck
.. code-block:: ipython
In [2]: s.sum()
Out[2]: 0.0
New Behavior, without regard to the bottleneck installation.
.. ipython:: python
s.sum()
Note that this also changes the sum of an empty Series
Previously regardless of bottlenck
.. code-block:: ipython
In [1]: pd.Series([]).sum()
Out[1]: 0
.. ipython:: python
pd.Series([]).sum()
.. _whatsnew_0210.api_breaking.loc:
Indexing with a list with missing labels is Deprecated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning NaN for missing labels.
This will now show a FutureWarning. In the future this will raise a KeyError (:issue:15747).
This warning will trigger on a DataFrame or a Series for using .loc[] or [[]] when passing a list-of-labels with at least 1 missing label.
See the :ref:deprecation docs <indexing.deprecate_loc_reindex_listlike>.
.. ipython:: python
s = pd.Series([1, 2, 3])
s
Previous Behavior
.. code-block:: ipython
In [4]: s.loc[[1, 2, 3]]
Out[4]:
1 2.0
2 3.0
3 NaN
dtype: float64
Current Behavior
.. code-block:: ipython
In [4]: s.loc[[1, 2, 3]]
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
The idiomatic way to achieve selecting potentially not-found elements is via .reindex()
.. ipython:: python
s.reindex([1, 2, 3])
Selection with all keys found is unchanged.
.. ipython:: python
s.loc[[1, 2]]
.. _whatsnew_0210.api.na_changes:
NA naming Changes
^^^^^^^^^^^^^^^^^
In order to promote more consistency among the pandas API, we have added additional top-level
functions :func:isna and :func:notna that are aliases for :func:isnull and :func:notnull.
The naming scheme is now more consistent with methods like .dropna() and .fillna(). Furthermore
in all cases where .isnull() and .notnull() methods are defined, these have additional methods
named .isna() and .notna(), these are included for classes Categorical,
Index, Series, and DataFrame. (:issue:15001).
The configuration option pd.options.mode.use_inf_as_null is deprecated, and pd.options.mode.use_inf_as_na is added as a replacement.
.. _whatsnew_0210.api_breaking.iteration_scalars:
Iteration of Series/Index will now return Python scalars
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, when using certain iteration methods for a Series with dtype int or float, you would receive a numpy scalar, e.g. a np.int64, rather than a Python int. Issue (:issue:10904) corrected this for Series.tolist() and list(Series). This change makes all iteration methods consistent, in particular, for __iter__() and .map(); note that this only affects int/float dtypes. (:issue:13236, :issue:13258, :issue:14216).
.. ipython:: python
s = pd.Series([1, 2, 3])
s
Previously:
.. code-block:: ipython
In [2]: type(list(s)[0])
Out[2]: numpy.int64
New Behaviour:
.. ipython:: python
type(list(s)[0])
Furthermore this will now correctly box the results of iteration for :func:DataFrame.to_dict as well.
.. ipython:: python
d = {'a':[1], 'b':['b']}
df = pd.DataFrame(d)
Previously:
.. code-block:: ipython
In [8]: type(df.to_dict()['a'][0])
Out[8]: numpy.int64
New Behaviour:
.. ipython:: python
type(df.to_dict()['a'][0])
.. _whatsnew_0210.api_breaking.loc_with_index:
Indexing with a Boolean Index
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously when passing a boolean Index to .loc, if the index of the Series/DataFrame had boolean labels,
you would get a label based selection, potentially duplicating result labels, rather than a boolean indexing selection
(where True selects elements), this was inconsistent how a boolean numpy array indexed. The new behavior is to
act like a boolean numpy array indexer. (:issue:17738)
Previous Behavior:
.. ipython:: python
s = pd.Series([1, 2, 3], index=[False, True, False])
s
Furthermore, previously if you had an index that was non-numeric (e.g. strings), then a boolean Index would raise a KeyError.
This will now be treated as a boolean indexer.
Previously Behavior:
.. ipython:: python
s = pd.Series([1,2,3], index=['a', 'b', 'c'])
s
.. code-block:: ipython
In [39]: s.loc[pd.Index([True, False, True])]
KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]"
In previous versions of pandas, resampling a Series/DataFrame indexed by a PeriodIndex returned a DatetimeIndex in some cases (:issue:12884). Resampling to a multiplied frequency now returns a PeriodIndex (:issue:15944). As a minor enhancement, resampling a PeriodIndex can now handle NaT values (:issue:13224)
Previous Behavior:
.. code-block:: ipython
In [1]: pi = pd.period_range('2017-01', periods=12, freq='M')
In [5]: resampled.index
Out[5]: DatetimeIndex(['2017-03-31', '2017-09-30', '2018-03-31'], dtype='datetime64[ns]', freq='2Q-DEC')
New Behavior:
.. ipython:: python
pi = pd.period_range('2017-01', periods=12, freq='M')
s = pd.Series(np.arange(12), index=pi)
resampled = s.resample('2Q').mean()
resampled
resampled.index
Upsampling and calling .ohlc() previously returned a Series, basically identical to calling .asfreq(). OHLC upsampling now returns a DataFrame with columns open, high, low and close (:issue:13083). This is consistent with downsampling and DatetimeIndex behavior.
Previous Behavior:
.. code-block:: ipython
In [1]: pi = pd.PeriodIndex(start='2000-01-01', freq='D', periods=10)
In [2]: s = pd.Series(np.arange(10), index=pi)
In [3]: s.resample('H').ohlc()
Out[3]:
2000-01-01 00:00 0.0
...
2000-01-10 23:00 NaN
Freq: H, Length: 240, dtype: float64
In [4]: s.resample('M').ohlc()
Out[4]:
open high low close
2000-01 0 9 0 9
New Behavior:
.. ipython:: python
pi = pd.PeriodIndex(start='2000-01-01', freq='D', periods=10)
s = pd.Series(np.arange(10), index=pi)
s.resample('H').ohlc()
s.resample('M').ohlc()
.. _whatsnew_0210.api_breaking.pandas_eval:
Improved error handling during item assignment in pd.eval
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:func:eval will now raise a ValueError when item assignment malfunctions, or
inplace operations are specified, but there is no item assignment in the expression (:issue:16732)
.. ipython:: python
arr = np.array([1, 2, 3])
Previously, if you attempted the following expression, you would get a not very helpful error message:
.. code-block:: ipython
In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None)
and integer or boolean arrays are valid indices
This is a very long way of saying numpy arrays don't support string-item indexing. With this
change, the error message is now this:
.. code-block:: python
In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
ValueError: Cannot assign expression output to target
It also used to be possible to evaluate expressions inplace, even if there was no item assignment:
.. code-block:: ipython
In [4]: pd.eval("1 + 2", target=arr, inplace=True)
Out[4]: 3
However, this input does not make much sense because the output is not being assigned to
the target. Now, a ValueError will be raised when such an input is passed in:
.. code-block:: ipython
In [4]: pd.eval("1 + 2", target=arr, inplace=True)
...
ValueError: Cannot operate inplace if there is no assignment
.. _whatsnew_0210.api_breaking.dtype_conversions:
Dtype Conversions
^^^^^^^^^^^^^^^^^
Previously assignments, .where() and .fillna() with a bool assignment, would coerce to same the type (e.g. int / float), or raise for datetimelikes. These will now preserve the bools with object dtypes. (:issue:16821).
.. ipython:: python
s = Series([1, 2, 3])
.. code-block:: python
In [5]: s[1] = True
In [6]: s
Out[6]:
0 1
1 1
2 3
dtype: int64
New Behavior
.. ipython:: python
s[1] = True
s
Previously, as assignment to a datetimelike with a non-datetimelike would coerce the
non-datetime-like item being assigned (:issue:14145).
.. ipython:: python
s = pd.Series([pd.Timestamp('2011-01-01'), pd.Timestamp('2012-01-01')])
.. code-block:: python
In [1]: s[1] = 1
In [2]: s
Out[2]:
0 2011-01-01 00:00:00.000000000
1 1970-01-01 00:00:00.000000001
dtype: datetime64[ns]
These now coerce to object dtype.
.. ipython:: python
s[1] = 1
s
Inconsistent behavior in .where() with datetimelikes which would raise rather than coerce to object (:issue:16402)
Bug in assignment against int64 data with np.ndarray with float64 dtype may keep int64 dtype (:issue:14001)
.. _whatsnew_210.api.multiindex_single:
MultiIndex Constructor with a Single Level
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The MultiIndex constructors no longer squeezes a MultiIndex with all
length-one levels down to a regular Index. This affects all the
MultiIndex constructors. (:issue:17178)
Previous behavior:
.. code-block:: ipython
In [2]: pd.MultiIndex.from_tuples([('a',), ('b',)])
Out[2]: Index(['a', 'b'], dtype='object')
Length 1 levels are no longer special-cased. They behave exactly as if you had
length 2+ levels, so a :class:MultiIndex is always returned from all of the
MultiIndex constructors:
UTC Localization with Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, :func:to_datetime did not localize datetime Series data when utc=True was passed. Now, :func:to_datetime will correctly localize Series with a datetime64[ns, UTC] dtype to be consistent with how list-like and Index data are handled. (:issue:6415).
Additionally, DataFrames with datetime columns that were parsed by :func:read_sql_table and :func:read_sql_query will also be localized to UTC only if the original SQL columns were timezone aware datetime columns.
Consistency of Range Functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In previous versions, there were some inconsistencies between the various range functions: :func:date_range, :func:bdate_range, :func:period_range, :func:timedelta_range, and :func:interval_range. (:issue:17471).
One of the inconsistent behaviors occurred when the start, end and period parameters were all specified, potentially leading to ambiguous ranges. When all three parameters were passed, interval_range ignored the period parameter, period_range ignored the end parameter, and the other range functions raised. To promote consistency among the range functions, and avoid potentially ambiguous ranges, interval_range and period_range will now raise when all three parameters are passed.
Updates
Here's a list of all the updates bundled in this pull request. I've added some links to make it easier for you to find all the information you need.
Changelogs
numpy 1.12.1 -> 1.13.3
pandas 0.20.1 -> 0.21.0