dssg / pgdedupe

A simple command line interface to the datamade/dedupe library.
https://pgdedupe.readthedocs.io
Other
42 stars 6 forks source link

Scheduled weekly dependency update for week 51 #82

Closed pyup-bot closed 6 years ago

pyup-bot commented 6 years ago

Updates

Here's a list of all the updates bundled in this pull request. I've added some links to make it easier for you to find all the information you need.

numpy 1.12.1 » 1.13.3 PyPI | Changelog | Homepage
pandas 0.20.1 » 0.21.1 PyPI | Changelog | Homepage
psycopg2 2.7.1 » 2.7.3.2 PyPI | Changelog | Homepage
dedupe 1.6.13 » 1.8.1 PyPI | Changelog | Repo
fastcluster 1.1.23 » 1.1.24 PyPI | Changelog | Homepage
wheel 0.29.0 » 0.30.0 PyPI | Changelog | Repo
flake8 3.3.0 » 3.5.0 PyPI | Changelog | Repo
tox 2.7.0 » 2.9.1 PyPI | Changelog | Docs
coverage 4.4.1 » 4.4.2 PyPI | Changelog | Repo
Sphinx 1.6.1 » 1.6.5 PyPI | Changelog | Homepage
cryptography 1.8.1 » 2.1.4 PyPI | Changelog | Repo
pytest 3.0.7 » 3.3.1 PyPI | Changelog | Repo | Homepage
Faker 0.7.12 » 0.8.7 PyPI | Changelog | Repo
tqdm 4.11.2 » 4.19.5 PyPI | Changelog | Repo

Changelogs

numpy 1.12.1 -> 1.13.3

1.13.3

==========================

This is a bugfix release for some problems found since 1.13.1. The most important fixes are for CVE-2017-12852 and temporary elision. Users of earlier versions of 1.13 should upgrade.

The Python versions supported are 2.7 and 3.4 - 3.6. The Python 3.6 wheels available from PIP are built with Python 3.6.2 and should be compatible with all previous versions of Python 3.6. It was cythonized with Cython 0.26.1, which should be free of the bugs found in 0.27 while also being compatible with Python 3.7-dev. The Windows wheels were built with OpenBlas instead ATLAS, which should improve the performance of the linear algebra functions.

The NumPy 1.13.3 release is a re-release of 1.13.2, which suffered from a bug in Cython 0.27.0.

Contributors

A total of 12 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

  • Allan Haldane
  • Brandon Carter
  • Charles Harris
  • Eric Wieser
  • Iryna Shcherbina +
  • James Bourbeau +
  • Jonathan Helmus
  • Julian Taylor
  • Matti Picus
  • Michael Lamparski +
  • Michael Seifert
  • Ralf Gommers

Pull requests merged

A total of 22 pull requests were merged for this release.

  • 9390 BUG: Return the poly1d coefficients array directly
  • 9555 BUG: Fix regression in 1.13.x in distutils.mingw32ccompiler.
  • 9556 BUG: Fix true_divide when dtype=np.float64 specified.
  • 9557 DOC: Fix some rst markup in numpy/doc/basics.py.
  • 9558 BLD: Remove -xhost flag from IntelFCompiler.
  • 9559 DOC: Removes broken docstring example (source code, png, pdf)...
  • 9580 BUG: Add hypot and cabs functions to WIN32 blacklist.
  • 9732 BUG: Make scalar function elision check if temp is writeable.
  • 9736 BUG: Various fixes to np.gradient
  • 9742 BUG: Fix np.pad for CVE-2017-12852
  • 9744 BUG: Check for exception in sort functions, add tests
  • 9745 DOC: Add whitespace after "versionadded::" directive so it actually...
  • 9746 BUG: Memory leak in np.dot of size 0
  • 9747 BUG: Adjust gfortran version search regex
  • 9757 BUG: Cython 0.27 breaks NumPy on Python 3.
  • 9764 BUG: Ensure _npy_scaled_cexp{,f,l} is defined when needed.
  • 9765 BUG: PyArray_CountNonzero does not check for exceptions
  • 9766 BUG: Fixes histogram monotonicity check for unsigned bin values
  • 9767 BUG: Ensure consistent result dtype of count_nonzero
  • 9771 BUG: MAINT: Fix mtrand for Cython 0.27.
  • 9772 DOC: Create the 1.13.2 release notes.
  • 9794 DOC: Create 1.13.3 release notes.

==========================

1.13.1

==========================

This is a bugfix release for problems found in 1.13.0. The major changes are fixes for the new memory overlap detection and temporary elision as well as reversion of the removal of the boolean binary - operator. Users of 1.13.0 should upgrade.

Thr Python versions supported are 2.7 and 3.4 - 3.6. Note that the Python 3.6 wheels available from PIP are built against 3.6.1, hence will not work when used with 3.6.0 due to Python bug 29943_. NumPy 1.13.2 will be released shortly after Python 3.6.2 is out to fix that problem. If you are using 3.6.0 the workaround is to upgrade to 3.6.1 or use an earlier Python version.

.. _29943: https://bugs.python.org/issue29943

Pull requests merged

A total of 19 pull requests were merged for this release.

  • 9240 DOC: BLD: fix lots of Sphinx warnings/errors.
  • 9255 Revert "DEP: Raise TypeError for subtract(bool, bool)."
  • 9261 BUG: don't elide into readonly and updateifcopy temporaries for...
  • 9262 BUG: fix missing keyword rename for common block in numpy.f2py
  • 9263 BUG: handle resize of 0d array
  • 9267 DOC: update f2py front page and some doc build metadata.
  • 9299 BUG: Fix Intel compilation on Unix.
  • 9317 BUG: fix wrong ndim used in empty where check
  • 9319 BUG: Make extensions compilable with MinGW on Py2.7
  • 9339 BUG: Prevent crash if ufunc doc string is null
  • 9340 BUG: umath: un-break ufunc where= when no out= is given
  • 9371 DOC: Add isnat/positive ufunc to documentation
  • 9372 BUG: Fix error in fromstring function from numpy.core.records...
  • 9373 BUG: ')' is printed at the end pointer of the buffer in numpy.f2py.
  • 9374 DOC: Create NumPy 1.13.1 release notes.
  • 9376 BUG: Prevent hang traversing ufunc userloop linked list
  • 9377 DOC: Use x1 and x2 in the heaviside docstring.
  • 9378 DOC: Add $PARAMS to the isnat docstring
  • 9379 DOC: Update the 1.13.1 release notes

Contributors

A total of 12 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

  • Andras Deak +
  • Bob Eldering +
  • Charles Harris
  • Daniel Hrisca +
  • Eric Wieser
  • Joshua Leahy +
  • Julian Taylor
  • Michael Seifert
  • Pauli Virtanen
  • Ralf Gommers
  • Roland Kaufmann
  • Warren Weckesser

==========================

1.13.0

==========================

This release supports Python 2.7 and 3.4 - 3.6.

Highlights

  • Operations like a + b + c will reuse temporaries on some platforms, resulting in less memory use and faster execution.
  • Inplace operations check if inputs overlap outputs and create temporaries to avoid problems.
  • New __array_ufunc__ attribute provides improved ability for classes to override default ufunc behavior.
  • New np.block function for creating blocked arrays.

New functions

  • New np.positive ufunc.
  • New np.divmod ufunc provides more efficient divmod.
  • New np.isnat ufunc tests for NaT special values.
  • New np.heaviside ufunc computes the Heaviside function.
  • New np.isin function, improves on in1d.
  • New np.block function for creating blocked arrays.
  • New PyArray_MapIterArrayCopyIfOverlap added to NumPy C-API.

See below for details.

Deprecations

  • Calling np.fix, np.isposinf, and np.isneginf with f(x, y=out) is deprecated - the argument should be passed as f(x, out=out), which matches other ufunc-like interfaces.
  • Use of the C-API NPY_CHAR type number deprecated since version 1.7 will now raise deprecation warnings at runtime. Extensions built with older f2py versions need to be recompiled to remove the warning.
  • np.ma.argsort, np.ma.minimum.reduce, and np.ma.maximum.reduce should be called with an explicit axis argument when applied to arrays with more than 2 dimensions, as the default value of this argument (None) is inconsistent with the rest of numpy (-1, 0, and 0, respectively).
  • np.ma.MaskedArray.mini is deprecated, as it almost duplicates the functionality of np.MaskedArray.min. Exactly equivalent behaviour can be obtained with np.ma.minimum.reduce.
  • The single-argument form of np.ma.minimum and np.ma.maximum is deprecated. np.maximum. np.ma.minimum(x) should now be spelt np.ma.minimum.reduce(x), which is consistent with how this would be done with np.minimum.
  • Calling ndarray.conjugate on non-numeric dtypes is deprecated (it should match the behavior of np.conjugate, which throws an error).
  • Calling expand_dims when the axis keyword does not satisfy -a.ndim - 1 <= axis <= a.ndim, where a is the array being reshaped, is deprecated.

Future Changes

  • Assignment between structured arrays with different field names will change in NumPy 1.14. Previously, fields in the dst would be set to the value of the identically-named field in the src. In numpy 1.14 fields will instead be assigned 'by position': The n-th field of the dst will be set to the n-th field of the src array. Note that the FutureWarning raised in NumPy 1.12 incorrectly reported this change as scheduled for NumPy 1.13 rather than NumPy 1.14.

Build System Changes

  • numpy.distutils now automatically determines C-file dependencies with GCC compatible compilers.

Compatibility notes

Error type changes

  • numpy.hstack() now throws ValueError instead of IndexError when input is empty.
  • Functions taking an axis argument, when that argument is out of range, now throw np.AxisError instead of a mixture of IndexError and ValueError. For backwards compatibility, AxisError subclasses both of these.

Tuple object dtypes

Support has been removed for certain obscure dtypes that were unintentionally allowed, of the form (old_dtype, new_dtype), where either of the dtypes is or contains the object dtype. As an exception, dtypes of the form (object, [('name', object)]) are still supported due to evidence of existing use.

DeprecationWarning to error

See Changes section for more detail.

  • partition, TypeError when non-integer partition index is used.
  • NpyIter_AdvancedNew, ValueError when oa_ndim == 0 and op_axes is NULL
  • negative(bool_), TypeError when negative applied to booleans.
  • subtract(bool_, bool_), TypeError when subtracting boolean from boolean.
  • np.equal, np.not_equal, object identity doesn't override failed comparison.
  • np.equal, np.not_equal, object identity doesn't override non-boolean comparison.
  • Deprecated boolean indexing behavior dropped. See Changes below for details.
  • Deprecated np.alterdot() and np.restoredot() removed.

FutureWarning to changed behavior

See Changes section for more detail.

  • numpy.average preserves subclasses
  • array == None and array != None do element-wise comparison.
  • np.equal, np.not_equal, object identity doesn't override comparison result.

dtypes are now always true

Previously bool(dtype) would fall back to the default python implementation, which checked if len(dtype) > 0. Since dtype objects implement __len__ as the number of record fields, bool of scalar dtypes would evaluate to False, which was unintuitive. Now bool(dtype) == True for all dtypes.

__getslice__ and __setslice__ are no longer needed in ndarray subclasses

When subclassing np.ndarray in Python 2.7, it is no longer necessary to implement __*slice__ on the derived class, as __*item__ will intercept these calls correctly.

Any code that did implement these will work exactly as before. Code that invokesndarray.__getslice__ (e.g. through super(...).__getslice__) will now issue a DeprecationWarning - .__getitem__(slice(start, end)) should be used instead.

Indexing MaskedArrays/Constants with ... (ellipsis) now returns MaskedArray

This behavior mirrors that of np.ndarray, and accounts for nested arrays in MaskedArrays of object dtype, and ellipsis combined with other forms of indexing.

C API changes

GUfuncs on empty arrays and NpyIter axis removal

It is now allowed to remove a zero-sized axis from NpyIter. Which may mean that code removing axes from NpyIter has to add an additional check when accessing the removed dimensions later on.

The largest followup change is that gufuncs are now allowed to have zero-sized inner dimensions. This means that a gufunc now has to anticipate an empty inner dimension, while this was never possible and an error raised instead.

For most gufuncs no change should be necessary. However, it is now possible for gufuncs with a signature such as (..., N, M) -> (..., M) to return a valid result if N=0 without further wrapping code.

PyArray_MapIterArrayCopyIfOverlap added to NumPy C-API

Similar to PyArray_MapIterArray but with an additional copy_if_overlap argument. If copy_if_overlap != 0, checks if input has memory overlap with any of the other arrays and make copies as appropriate to avoid problems if the input is modified during the iteration. See the documentation for more complete documentation.

New Features

__array_ufunc__ added

This is the renamed and redesigned __numpy_ufunc__. Any class, ndarray subclass or not, can define this method or set it to None in order to override the behavior of NumPy's ufuncs. This works quite similarly to Python's __mul__ and other binary operation routines. See the documentation for a more detailed description of the implementation and behavior of this new option. The API is provisional, we do not yet guarantee backward compatibility as modifications may be made pending feedback. See the NEP and documentation for more details.

.. _NEP: https://github.com/numpy/numpy/blob/master/doc/neps/ufunc-overrides.rst .. _documentation: https://github.com/charris/numpy/blob/master/doc/source/reference/arrays.classes.rst

New positive ufunc

This ufunc corresponds to unary +, but unlike + on an ndarray it will raise an error if array values do not support numeric operations.

New divmod ufunc

This ufunc corresponds to the Python builtin divmod, and is used to implement divmod when called on numpy arrays. np.divmod(x, y) calculates a result equivalent to (np.floor_divide(x, y), np.remainder(x, y)) but is approximately twice as fast as calling the functions separately.

np.isnat ufunc tests for NaT special datetime and timedelta values

The new ufunc np.isnat finds the positions of special NaT values within datetime and timedelta arrays. This is analogous to np.isnan.

np.heaviside ufunc computes the Heaviside function

The new function np.heaviside(x, h0) (a ufunc) computes the Heaviside function:

.. code::

                  { 0   if x < 0,

heaviside(x, h0) = { h0 if x == 0, { 1 if x > 0.

np.block function for creating blocked arrays

Add a new block function to the current stacking functions vstack, hstack, and stack. This allows concatenation across multiple axes simultaneously, with a similar syntax to array creation, but where elements can themselves be arrays. For instance::

>>> A = np.eye(2) 2 >>> B = np.eye(3) 3 >>> np.block([ ... [A, np.zeros((2, 3))], ... [np.ones((3, 2)), B ] ... ]) array([[ 2., 0., 0., 0., 0.], [ 0., 2., 0., 0., 0.], [ 1., 1., 3., 0., 0.], [ 1., 1., 0., 3., 0.], [ 1., 1., 0., 0., 3.]])

While primarily useful for block matrices, this works for arbitrary dimensions of arrays.

It is similar to Matlab's square bracket notation for creating block matrices.

isin function, improving on in1d

The new function isin tests whether each element of an N-dimensonal array is present anywhere within a second array. It is an enhancement of in1d that preserves the shape of the first array.

Temporary elision

On platforms providing the backtrace function NumPy will try to avoid creating temporaries in expression involving basic numeric types. For example d = a + b + c is transformed to d = a + b; d += c which can improve performance for large arrays as less memory bandwidth is required to perform the operation.

axes argument for unique

In an N-dimensional array, the user can now choose the axis along which to look for duplicate N-1-dimensional elements using numpy.unique. The original behaviour is recovered if axis=None (default).

np.gradient now supports unevenly spaced data

Users can now specify a not-constant spacing for data. In particular np.gradient can now take:

  1. A single scalar to specify a sample distance for all dimensions.
  2. N scalars to specify a constant sample distance for each dimension. i.e. dx, dy, dz, ...
  3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension
  4. Any combination of N scalars/arrays with the meaning of 2. and 3.

This means that, e.g., it is now possible to do the following::

>>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float) >>> dx = 2. >>> y = [1., 1.5, 3.5] >>> np.gradient(f, dx, y) [array([[ 1. , 1. , -0.5], [ 1. , 1. , -0.5]]), array([[ 2. , 2. , 2. ], [ 2. , 1.7, 0.5]])]

Support for returning arrays of arbitrary dimensions in apply_along_axis

Previously, only scalars or 1D arrays could be returned by the function passed to apply_along_axis. Now, it can return an array of any dimensionality (including 0D), and the shape of this array replaces the axis of the array being iterated over.

.ndim property added to dtype to complement .shape

For consistency with ndarray and broadcast, d.ndim is a shorthand for len(d.shape).

Support for tracemalloc in Python 3.6

NumPy now supports memory tracing with tracemalloc_ module of Python 3.6 or newer. Memory allocations from NumPy are placed into the domain defined by numpy.lib.tracemalloc_domain. Note that NumPy allocation will not show up in tracemalloc_ of earlier Python versions.

.. _tracemalloc: https://docs.python.org/3/library/tracemalloc.html

NumPy may be built with relaxed stride checking debugging

Setting NPY_RELAXED_STRIDES_DEBUG=1 in the environment when relaxed stride checking is enabled will cause NumPy to be compiled with the affected strides set to the maximum value of npy_intp in order to help detect invalid usage of the strides in downstream projects. When enabled, invalid usage often results in an error being raised, but the exact type of error depends on the details of the code. TypeError and OverflowError have been observed in the wild.

It was previously the case that this option was disabled for releases and enabled in master and changing between the two required editing the code. It is now disabled by default but can be enabled for test builds.

Improvements

Ufunc behavior for overlapping inputs

Operations where ufunc input and output operands have memory overlap produced undefined results in previous NumPy versions, due to data dependency issues. In NumPy 1.13.0, results from such operations are now defined to be the same as for equivalent operations where there is no memory overlap.

Operations affected now make temporary copies, as needed to eliminate data dependency. As detecting these cases is computationally expensive, a heuristic is used, which may in rare cases result to needless temporary copies. For operations where the data dependency is simple enough for the heuristic to analyze, temporary copies will not be made even if the arrays overlap, if it can be deduced copies are not necessary. As an example,np.add(a, b, out=a) will not involve copies.

To illustrate a previously undefined operation::

>>> x = np.arange(16).astype(float) >>> np.add(x[1:], x[:-1], out=x[1:])

In NumPy 1.13.0 the last line is guaranteed to be equivalent to::

>>> np.add(x[1:].copy(), x[:-1].copy(), out=x[1:])

A similar operation with simple non-problematic data dependence is::

>>> x = np.arange(16).astype(float) >>> np.add(x[1:], x[:-1], out=x[:-1])

It will continue to produce the same results as in previous NumPy versions, and will not involve unnecessary temporary copies.

The change applies also to in-place binary operations, for example::

>>> x = np.random.rand(500, 500) >>> x += x.T

This statement is now guaranteed to be equivalent to x[...] = x + x.T, whereas in previous NumPy versions the results were undefined.

Partial support for 64-bit f2py extensions with MinGW

Extensions that incorporate Fortran libraries can now be built using the free MinGW toolset, also under Python 3.5. This works best for extensions that only do calculations and uses the runtime modestly (reading and writing from files, for instance). Note that this does not remove the need for Mingwpy; if you make extensive use of the runtime, you will most likely run into issues. Instead, it should be regarded as a band-aid until Mingwpy is fully functional.

Extensions can also be compiled using the MinGW toolset using the runtime library from the (moveable) WinPython 3.4 distribution, which can be useful for programs with a PySide1/Qt4 front-end.

.. _MinGW: https://sf.net/projects/mingw-w64/files/Toolchains%20targetting%20Win64/Personal%20Builds/mingw-builds/6.2.0/threads-win32/seh/

.. _issues: https://mingwpy.github.io/issues.html

Performance improvements for packbits and unpackbits

The functions numpy.packbits with boolean input and numpy.unpackbits have been optimized to be a significantly faster for contiguous data.

Fix for PPC long double floating point information

In previous versions of NumPy, the finfo function returned invalid information about the double double_ format of the longdouble float type on Power PC (PPC). The invalid values resulted from the failure of the NumPy algorithm to deal with the variable number of digits in the significand that are a feature of PPC long doubles. This release by-passes the failing algorithm by using heuristics to detect the presence of the PPC double double format. A side-effect of using these heuristics is that the finfo function is faster than previous releases.

.. _PPC long doubles: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.genprogc/128bit_long_double_floating-point_datatype.htm

.. _double double: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_formatDouble-double_arithmetic

Better default repr for ndarray subclasses

Subclasses of ndarray with no repr specialization now correctly indent their data and type lines.

More reliable comparisons of masked arrays

Comparisons of masked arrays were buggy for masked scalars and failed for structured arrays with dimension higher than one. Both problems are now solved. In the process, it was ensured that in getting the result for a structured array, masked fields are properly ignored, i.e., the result is equal if all fields that are non-masked in both are equal, thus making the behaviour identical to what one gets by comparing an unstructured masked array and then doing .all() over some axis.

np.matrix with booleans elements can now be created using the string syntax

np.matrix failed whenever one attempts to use it with booleans, e.g., np.matrix('True'). Now, this works as expected.

More linalg operations now accept empty vectors and matrices

All of the following functions in np.linalg now work when given input arrays with a 0 in the last two dimensions: det, slogdet, pinv, eigvals, eigvalsh, eig, eigh.

Bundled version of LAPACK is now 3.2.2

NumPy comes bundled with a minimal implementation of lapack for systems without a lapack library installed, under the name of lapack_lite. This has been upgraded from LAPACK 3.0.0 (June 30, 1999) to LAPACK 3.2.2 (June 30, 2010). See the LAPACK changelogs_ for details on the all the changes this entails.

While no new features are exposed through numpy, this fixes some bugs regarding "workspace" sizes, and in some places may use faster algorithms.

.. _LAPACK changelogs: http://www.netlib.org/lapack/release_notes.html_4_history_of_lapack_releases

reduce of np.hypot.reduce and np.logical_xor allowed in more cases

This now works on empty arrays, returning 0, and can reduce over multiple axes. Previously, a ValueError was thrown in these cases.

Better repr of object arrays

Object arrays that contain themselves no longer cause a recursion error.

Object arrays that contain list objects are now printed in a way that makes clear the difference between a 2d object array, and a 1d object array of lists.

Changes

argsort on masked arrays takes the same default arguments as sort

By default, argsort now places the masked values at the end of the sorted array, in the same way that sort already did. Additionally, the end_with argument is added to argsort, for consistency with sort. Note that this argument is not added at the end, so breaks any code that passed fill_value as a positional argument.

average now preserves subclasses

For ndarray subclasses, numpy.average will now return an instance of the subclass, matching the behavior of most other NumPy functions such as mean. As a consequence, also calls that returned a scalar may now return a subclass array scalar.

array == None and array != None do element-wise comparison

Previously these operations returned scalars False and True respectively.

np.equal, np.not_equal for object arrays ignores object identity

Previously, these functions always treated identical objects as equal. This had the effect of overriding comparison failures, comparison of objects that did not return booleans, such as np.arrays, and comparison of objects where the results differed from object identity, such as NaNs.

Boolean indexing changes

  • Boolean array-likes (such as lists of python bools) are always treated as boolean indexes.

  • Boolean scalars (including python True) are legal boolean indexes and never treated as integers.

  • Boolean indexes must match the dimension of the axis that they index.

  • Boolean indexes used on the lhs of an assignment must match the dimensions of the rhs.

  • Boolean indexing into scalar arrays return a new 1-d array. This means that array(1)[array(True)] gives array([1]) and not the original array.

np.random.multivariate_normal behavior with bad covariance matrix

It is now possible to adjust the behavior the function will have when dealing with the covariance matrix by using two new keyword arguments:

  • tol can be used to specify a tolerance to use when checking that the covariance matrix is positive semidefinite.

  • check_valid can be used to configure what the function will do in the presence of a matrix that is not positive semidefinite. Valid options are ignore, warn and raise. The default value, warn keeps the the behavior used on previous releases.

assert_array_less compares np.inf and -np.inf now

Previously, np.testing.assert_array_less ignored all infinite values. This is not the expected behavior both according to documentation and intuitively. Now, -inf < x < inf is considered True for any real number x and all other cases fail.

assert_array_ and masked arrays assert_equal hide less warnings

Some warnings that were previously hidden by the assert_array_ functions are not hidden anymore. In most cases the warnings should be correct and, should they occur, will require changes to the tests using these functions. For the masked array assert_equal version, warnings may occur when comparing NaT. The function presently does not handle NaT or NaN specifically and it may be best to avoid it at this time should a warning show up due to this change.

offset attribute value in memmap objects

The offset attribute in a memmap object is now set to the offset into the file. This is a behaviour change only for offsets greater than mmap.ALLOCATIONGRANULARITY.

np.real and np.imag return scalars for scalar inputs

Previously, np.real and np.imag used to return array objects when provided a scalar input, which was inconsistent with other functions like np.angle and np.conj.

The polynomial convenience classes cannot be passed to ufuncs

The ABCPolyBase class, from which the convenience classes are derived, sets __array_ufun__ = None in order of opt out of ufuncs. If a polynomial convenience class instance is passed as an argument to a ufunc, a TypeError will now be raised.

Output arguments to ufuncs can be tuples also for ufunc methods

For calls to ufuncs, it was already possible, and recommended, to use an out argument with a tuple for ufuncs with multiple outputs. This has now been extended to output arguments in the reduce, accumulate, and reduceat methods. This is mostly for compatibility with __array_ufunc; there are no ufuncs yet that have more than one output.

=========================

pandas 0.20.1 -> 0.21.1

0.21.1


This is a minor bug-fix release in the 0.21.x series and includes some small regression fixes, bug fixes and performance improvements. We recommend that all users upgrade to this version.

Highlights include:

  • Temporarily restore matplotlib datetime plotting functionality. This should resolve issues for users who implicitly relied on pandas to plot datetimes with matplotlib. See :ref:here &lt;whatsnew_0211.converters&gt;.
  • Improvements to the Parquet IO functions introduced in 0.21.0. See :ref:here &lt;whatsnew_0211.enhancements.parquet&gt;.

.. contents:: What's new in v0.21.1 :local: :backlinks: none

.. _whatsnew_0211.converters:

Restore Matplotlib datetime Converter Registration


Pandas implements some matplotlib converters for nicely formatting the axis
labels on plots with ``datetime`` or ``Period`` values. Prior to pandas 0.21.0,
these were implicitly registered with matplotlib, as a side effect of ``import
pandas``.

In pandas 0.21.0, we required users to explicitly register the
converter. This caused problems for some users who relied on those converters
being present for regular ``matplotlib.pyplot`` plotting methods, so we&#39;re
temporarily reverting that change; pandas 0.21.1 again registers the converters on
import, just like before 0.21.0.

We&#39;ve added a new option to control the converters:
``pd.options.plotting.matplotlib.register_converters``. By default, they are
registered. Toggling this to ``False`` removes pandas&#39; formatters and restore
any converters we overwrote when registering them (:issue:`18301`).

We&#39;re working with the matplotlib developers to make this easier. We&#39;re trying
to balance user convenience (automatically registering the converters) with
import performance and best practices (importing pandas shouldn&#39;t have the side
effect of overwriting any custom converters you&#39;ve already set). In the future
we hope to have most of the datetime formatting functionality in matplotlib,
with just the pandas-specific converters in pandas. We&#39;ll then gracefully
deprecate the automatic registration of converters in favor of users explicitly
registering them when they want them.

.. _whatsnew_0211.enhancements:

New features

.. _whatsnew_0211.enhancements.parquet:

Improvements to the Parquet IO functionality ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  • :func:DataFrame.to_parquet will now write non-default indexes when the underlying engine supports it. The indexes will be preserved when reading back in with :func:read_parquet (:issue:18581).
  • :func:read_parquet now allows to specify the columns to read from a parquet file (:issue:18154)
  • :func:read_parquet now allows to specify kwargs which are passed to the respective engine (:issue:18216)

.. _whatsnew_0211.enhancements.other:

Other Enhancements ^^^^^^^^^^^^^^^^^^

  • :meth:Timestamp.timestamp is now available in Python 2.7. (:issue:17329)
  • :class:Grouper and :class:TimeGrouper now have a friendly repr output (:issue:18203).

.. _whatsnew_0211.deprecations:

Deprecations


- ``pandas.tseries.register`` has been renamed to
 :func:`pandas.plotting.register_matplotlib_converters`` (:issue:`18301`)

.. _whatsnew_0211.performance:

Performance Improvements
  • Improved performance of plotting large series/dataframes (:issue:18236).

.. _whatsnew_0211.bug_fixes:

Bug Fixes


Conversion
^^^^^^^^^^

- Bug in :class:`TimedeltaIndex` subtraction could incorrectly overflow when ``NaT`` is present (:issue:`17791`)
- Bug in :class:`DatetimeIndex` subtracting datetimelike from DatetimeIndex could fail to overflow (:issue:`18020`)
- Bug in :meth:`IntervalIndex.copy` when copying and ``IntervalIndex`` with non-default ``closed`` (:issue:`18339`)
- Bug in :func:`DataFrame.to_dict` where columns of datetime that are tz-aware were not converted to required arrays when used with ``orient=&#39;records&#39;``, raising``TypeError` (:issue:`18372`)
- Bug in :class:`DateTimeIndex` and :meth:`date_range` where mismatching tz-aware ``start`` and ``end`` timezones would not raise an err if ``end.tzinfo`` is None (:issue:`18431`)
- Bug in :meth:`Series.fillna` which raised when passed a long integer on Python 2 (:issue:`18159`).

Indexing
^^^^^^^^

- Bug in a boolean comparison of a ``datetime.datetime`` and a ``datetime64[ns]`` dtype Series (:issue:`17965`)
- Bug where a ``MultiIndex`` with more than a million records was not raising ``AttributeError`` when trying to access a missing attribute (:issue:`18165`)
- Bug in :class:`IntervalIndex` constructor when a list of intervals is passed with non-default ``closed`` (:issue:`18334`)
- Bug in ``Index.putmask`` when an invalid mask passed (:issue:`18368`)
- Bug in masked assignment of a ``timedelta64[ns]`` dtype ``Series``, incorrectly coerced to float (:issue:`18493`)

I/O
^^^

- Bug in class:`~pandas.io.stata.StataReader` not converting date/time columns with display formatting addressed (:issue:`17990`). Previously columns with display formatting were normally left as ordinal numbers and not converted to datetime objects.
- Bug in :func:`read_csv` when reading a compressed UTF-16 encoded file (:issue:`18071`)
- Bug in :func:`read_csv` for handling null values in index columns when specifying ``na_filter=False`` (:issue:`5239`)
- Bug in :func:`read_csv` when reading numeric category fields with high cardinality (:issue:`18186`)
- Bug in :meth:`DataFrame.to_csv` when the table had ``MultiIndex`` columns, and a list of strings was passed in for ``header`` (:issue:`5539`)
- Bug in parsing integer datetime-like columns with specified format in ``read_sql`` (:issue:`17855`).
- Bug in :meth:`DataFrame.to_msgpack` when serializing data of the ``numpy.bool_`` datatype (:issue:`18390`)
- Bug in :func:`read_json` not decoding when reading line deliminted JSON from S3 (:issue:`17200`)
- Bug in :func:`pandas.io.json.json_normalize` to avoid modification of ``meta`` (:issue:`18610`)
- Bug in :func:`to_latex` where repeated multi-index values were not printed even though a higher level index differed from the previous row (:issue:`14484`)
- Bug when reading NaN-only categorical columns in :class:`HDFStore` (:issue:`18413`)
- Bug in :meth:`DataFrame.to_latex` with ``longtable=True`` where a latex multicolumn always spanned over three columns (:issue:`17959`)

Plotting
^^^^^^^^

- Bug in ``DataFrame.plot()`` and ``Series.plot()`` with :class:`DatetimeIndex` where a figure generated by them is not pickleable in Python 3 (:issue:`18439`)

Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^

- Bug in ``DataFrame.resample(...).apply(...)`` when there is a callable that returns different columns (:issue:`15169`)
- Bug in ``DataFrame.resample(...)`` when there is a time change (DST) and resampling frequecy is 12h or higher (:issue:`15549`)
- Bug in ``pd.DataFrameGroupBy.count()`` when counting over a datetimelike column (:issue:`13393`)
- Bug in ``rolling.var`` where calculation is inaccurate with a zero-valued array (:issue:`18430`)

Reshaping
^^^^^^^^^

- Error message in ``pd.merge_asof()`` for key datatype mismatch now includes datatype of left and right key (:issue:`18068`)
- Bug in ``pd.concat`` when empty and non-empty DataFrames or Series are concatenated (:issue:`18178` :issue:`18187`)
- Bug in ``DataFrame.filter(...)`` when :class:`unicode` is passed as a condition in Python 2 (:issue:`13101`)
- Bug when merging empty DataFrames when ``np.seterr(divide=&#39;raise&#39;)`` is set (:issue:`17776`)

Numeric
^^^^^^^

- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`)

Categorical
^^^^^^^^^^^

- Bug in :meth:`DataFrame.astype` where casting to &#39;category&#39; on an empty ``DataFrame`` causes a segmentation fault (:issue:`18004`)
- Error messages in the testing module have been improved when items have different ``CategoricalDtype`` (:issue:`18069`)
- ``CategoricalIndex`` can now correctly take a ``pd.api.types.CategoricalDtype`` as its dtype (:issue:`18116`)
- Bug in ``Categorical.unique()`` returning read-only ``codes``  array when all categories were ``NaN`` (:issue:`18051`)
- Bug in ``DataFrame.groupby(axis=1)`` with a ``CategoricalIndex`` (:issue:`18432`)

String
^^^^^^

- :meth:`Series.str.split()` will now propogate ``NaN`` values across all expanded columns instead of ``None`` (:issue:`18450`)

.. _whatsnew_0130:

### 0.21.0

--------------------------

This is a major release from 0.20.3 and includes a number of API changes, deprecations, new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

Highlights include:

- Integration with `Apache Parquet &lt;https://parquet.apache.org/&gt;`__, including a new top-level :func:`read_parquet` function and :meth:`DataFrame.to_parquet` method, see :ref:`here &lt;whatsnew_0210.enhancements.parquet&gt;`.
- New user-facing :class:`pandas.api.types.CategoricalDtype` for specifying
 categoricals independent of the data, see :ref:`here &lt;whatsnew_0210.enhancements.categorical_dtype&gt;`.
- The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames is now consistent and no longer depends on whether `bottleneck &lt;http://berkeleyanalytics.com/bottleneck&gt;`__ is installed, and ``sum`` and ``prod`` on empty Series now return NaN instead of 0, see :ref:`here &lt;whatsnew_0210.api_breaking.bottleneck&gt;`.
- Compatibility fixes for pypy, see :ref:`here &lt;whatsnew_0210.pypy&gt;`.
- Additions to the ``drop``, ``reindex`` and ``rename`` API to make them more consistent, see :ref:`here &lt;whatsnew_0210.enhancements.drop_api&gt;`.
- Addition of the new methods ``DataFrame.infer_objects`` (see :ref:`here &lt;whatsnew_0210.enhancements.infer_objects&gt;`) and ``GroupBy.pipe`` (see :ref:`here &lt;whatsnew_0210.enhancements.GroupBy_pipe&gt;`).
- Indexing with a list of labels, where one or more of the labels is missing, is deprecated and will raise a KeyError in a future version, see :ref:`here &lt;whatsnew_0210.api_breaking.loc&gt;`.

Check the :ref:`API Changes &lt;whatsnew_0210.api_breaking&gt;` and :ref:`deprecations &lt;whatsnew_0210.deprecations&gt;` before updating.

.. contents:: What&#39;s new in v0.21.0
   :local:
   :backlinks: none
   :depth: 2

.. _whatsnew_0210.enhancements:

New features

.. _whatsnew_0210.enhancements.parquet:

Integration with Apache Parquet file format ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Integration with Apache Parquet &lt;https://parquet.apache.org/&gt;__, including a new top-level :func:read_parquet and :func:DataFrame.to_parquet method, see :ref:here &lt;io.parquet&gt; (:issue:15838, :issue:17438).

Apache Parquet &lt;https://parquet.apache.org/&gt;__ provides a cross-language, binary file format for reading and writing data frames efficiently. Parquet is designed to faithfully serialize and de-serialize DataFrame s, supporting all of the pandas dtypes, including extension dtypes such as datetime with timezones.

This functionality depends on either the pyarrow &lt;http://arrow.apache.org/docs/python/&gt; or fastparquet &lt;https://fastparquet.readthedocs.io/en/latest/&gt; library. For more details, see see :ref:the IO docs on Parquet &lt;io.parquet&gt;.

.. _whatsnew_0210.enhancements.infer_objects:

infer_objects type conversion ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :meth:DataFrame.infer_objects and :meth:Series.infer_objects methods have been added to perform dtype inference on object columns, replacing some of the functionality of the deprecated convert_objects method. See the documentation :ref:here &lt;basics.object_conversion&gt; for more details. (:issue:11221)

This method only performs soft conversions on object columns, converting Python objects to native types, but not any coercive conversions. For example:

.. ipython:: python

df = pd.DataFrame({'A': [1, 2, 3], 'B': np.array([1, 2, 3], dtype='object'), 'C': ['1', '2', '3']}) df.dtypes df.infer_objects().dtypes

Note that column &#39;C&#39; was not converted - only scalar numeric types will be converted to a new type. Other types of conversion should be accomplished using the :func:to_numeric function (or :func:to_datetime, :func:to_timedelta).

.. ipython:: python

df = df.infer_objects() df['C'] = pd.to_numeric(df['C'], errors='coerce') df.dtypes

.. _whatsnew_0210.enhancements.attribute_access:

Improved warnings when attempting to create columns ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

New users are often puzzled by the relationship between column operations and attribute access on DataFrame instances (:issue:7175). One specific instance of this confusion is attempting to create a new column by setting an attribute on the DataFrame:

.. code-block:: ipython

In[1]: df = pd.DataFrame({'one': [1., 2., 3.]}) In[2]: df.two = [4, 5, 6]

This does not raise any obvious exceptions, but also does not create a new column:

.. code-block:: ipython

In[3]: df Out[3]: one 0 1.0 1 2.0 2 3.0

Setting a list-like data structure into a new attribute now raises a UserWarning about the potential for unexpected behavior. See :ref:Attribute Access &lt;indexing.attribute_access&gt;.

.. _whatsnew_0210.enhancements.drop_api:

drop now also accepts index/columns keywords ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :meth:~DataFrame.drop method has gained index/columns keywords as an alternative to specifying the axis. This is similar to the behavior of reindex (:issue:12392).

For example:

.. ipython:: python

df = pd.DataFrame(np.arange(8).reshape(2,4), columns=['A', 'B', 'C', 'D']) df df.drop(['B', 'C'], axis=1) the following is now equivalent df.drop(columns=['B', 'C'])

.. _whatsnew_0210.enhancements.rename_reindex_axis:

rename, reindex now also accept axis keyword ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :meth:DataFrame.rename and :meth:DataFrame.reindex methods have gained the axis keyword to specify the axis to target with the operation (:issue:12392).

Here's rename:

.. ipython:: python

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) df.rename(str.lower, axis='columns') df.rename(id, axis='index')

And reindex:

.. ipython:: python

df.reindex(['A', 'B', 'C'], axis='columns') df.reindex([0, 1, 3], axis='index')

The "index, columns" style continues to work as before.

.. ipython:: python

df.rename(index=id, columns=str.lower) df.reindex(index=[0, 1, 3], columns=['A', 'B', 'C'])

We highly encourage using named arguments to avoid confusion when using either style.

.. _whatsnew_0210.enhancements.categorical_dtype:

CategoricalDtype for specifying categoricals ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:pandas.api.types.CategoricalDtype has been added to the public API and expanded to include the categories and ordered attributes. A CategoricalDtype can be used to specify the set of categories and orderedness of an array, independent of the data. This can be useful for example, when converting string data to a Categorical (:issue:14711, :issue:15078, :issue:16015, :issue:17643):

.. ipython:: python

from pandas.api.types import CategoricalDtype

s = pd.Series(['a', 'b', 'c', 'a']) strings dtype = CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True) s.astype(dtype)

One place that deserves special mention is in :meth:read_csv. Previously, with dtype={&#39;col&#39;: &#39;category&#39;}, the returned values and categories would always be strings.

.. ipython:: python :suppress:

from pandas.compat import StringIO

.. ipython:: python

data = 'A,B\na,1\nb,2\nc,3' pd.read_csv(StringIO(data), dtype={'B': 'category'}).B.cat.categories

Notice the "object" dtype.

With a CategoricalDtype of all numerics, datetimes, or timedeltas, we can automatically convert to the correct type

.. ipython:: python

dtype = {'B': CategoricalDtype([1, 2, 3])} pd.read_csv(StringIO(data), dtype=dtype).B.cat.categories

The values have been correctly interpreted as integers.

The .dtype property of a Categorical, CategoricalIndex or a Series with categorical type will now return an instance of CategoricalDtype. While the repr has changed, str(CategoricalDtype()) is still the string &#39;category&#39;. We'll take this moment to remind users that the preferred way to detect categorical data is to use :func:pandas.api.types.is_categorical_dtype, and not str(dtype) == &#39;category&#39;.

See the :ref:CategoricalDtype docs &lt;categorical.categoricaldtype&gt; for more.

.. _whatsnew_0210.enhancements.GroupBy_pipe:

GroupBy objects now have a pipe method ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

GroupBy objects now have a pipe method, similar to the one on DataFrame and Series, that allow for functions that take a GroupBy to be composed in a clean, readable syntax. (:issue:17871)

For a concrete example on combining .groupby and .pipe , imagine having a DataFrame with columns for stores, products, revenue and sold quantity. We'd like to do a groupwise calculation of prices (i.e. revenue/quantity) per store and per product. We could do this in a multi-step operation, but expressing it in terms of piping can make the code more readable.

First we set the data:

.. ipython:: python

import numpy as np n = 1000 df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n), 'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n), 'Revenue': (np.random.random(n)*50+10).round(2), 'Quantity': np.random.randint(1, 10, size=n)}) df.head(2)

Now, to find prices per store/product, we can simply do:

.. ipython:: python

(df.groupby(['Store', 'Product']) .pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum()) .unstack().round(2))

See the :ref:documentation &lt;groupby.pipe&gt; for more.

.. _whatsnew_0210.enhancements.reanme_categories:

Categorical.rename_categories accepts a dict-like ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:~Series.cat.rename_categories now accepts a dict-like argument for new_categories. The previous categories are looked up in the dictionary's keys and replaced if found. The behavior of missing and extra keys is the same as in :meth:DataFrame.rename.

.. ipython:: python

c = pd.Categorical(['a', 'a', 'b']) c.rename_categories({"a": "eh", "b": "bee"})

.. warning::

To assist with upgrading pandas, rename_categories treats Series as list-like. Typically, Series are considered to be dict-like (e.g. in .rename, .map). In a future version of pandas rename_categories will change to treat them as dict-like. Follow the warning message's recommendations for writing future-proof code.

.. code-block:: ipython

   In [33]: c.rename_categories(pd.Series([0, 1], index=[&#39;a&#39;, &#39;c&#39;]))
   FutureWarning: Treating Series &#39;new_categories&#39; as a list-like and using the values.
   In a future version, &#39;rename_categories&#39; will treat Series like a dictionary.
   For dict-like, use &#39;new_categories.to_dict()&#39;
   For list-like, use &#39;new_categories.values&#39;.
   Out[33]:
   [0, 0, 1]
   Categories (2, int64): [0, 1]

.. _whatsnew_0210.enhancements.other:

Other Enhancements ^^^^^^^^^^^^^^^^^^

New functions or methods """"""""""""""""""""""""

  • :meth:~pandas.core.resample.Resampler.nearest is added to support nearest-neighbor upsampling (:issue:17496).
  • :class:~pandas.Index has added support for a to_frame method (:issue:15230).

New keywords """"""""""""

  • Added a skipna parameter to :func:~pandas.api.types.infer_dtype to support type inference in the presence of missing values (:issue:17059).
  • :func:Series.to_dict and :func:DataFrame.to_dict now support an into keyword which allows you to specify the collections.Mapping subclass that you would like returned. The default is dict, which is backwards compatible. (:issue:16122)
  • :func:Series.set_axis and :func:DataFrame.set_axis now support the inplace parameter. (:issue:14636)
  • :func:Series.to_pickle and :func:DataFrame.to_pickle have gained a protocol parameter (:issue:16252). By default, this parameter is set to HIGHEST_PROTOCOL &lt;https://docs.python.org/3/library/pickle.htmldata-stream-format&gt;__
  • :func:read_feather has gained the nthreads parameter for multi-threaded operations (:issue:16359)
  • :func:DataFrame.clip() and :func:Series.clip() have gained an inplace argument. (:issue:15388)
  • :func:crosstab has gained a margins_name parameter to define the name of the row / column that will contain the totals when margins=True. (:issue:15972)
  • :func:read_json now accepts a chunksize parameter that can be used when lines=True. If chunksize is passed, read_json now returns an iterator which reads in chunksize lines with each iteration. (:issue:17048)
  • :func:read_json and :func:~DataFrame.to_json now accept a compression argument which allows them to transparently handle compressed files. (:issue:17798)

Various enhancements """"""""""""""""""""

  • Improved the import time of pandas by about 2.25x. (:issue:16764)
  • Support for PEP 519 -- Adding a file system path protocol &lt;https://www.python.org/dev/peps/pep-0519/&gt;_ on most readers (e.g. :func:read_csv) and writers (e.g. :meth:DataFrame.to_csv) (:issue:13823).
  • Added a __fspath__ method to pd.HDFStore, pd.ExcelFile, and pd.ExcelWriter to work properly with the file system path protocol (:issue:13823).
  • The validate argument for :func:merge now checks whether a merge is one-to-one, one-to-many, many-to-one, or many-to-many. If a merge is found to not be an example of specified merge type, an exception of type MergeError will be raised. For more, see :ref:here &lt;merging.validation&gt; (:issue:16270)
  • Added support for PEP 518 &lt;https://www.python.org/dev/peps/pep-0518/&gt;_ (pyproject.toml) to the build system (:issue:16745)
  • :func:RangeIndex.append now returns a RangeIndex object when possible (:issue:16212)
  • :func:Series.rename_axis and :func:DataFrame.rename_axis with inplace=True now return None while renaming the axis inplace. (:issue:15704)
  • :func:api.types.infer_dtype now infers decimals. (:issue:15690)
  • :func:DataFrame.select_dtypes now accepts scalar values for include/exclude as well as list-like. (:issue:16855)
  • :func:date_range now accepts 'YS' in addition to 'AS' as an alias for start of year. (:issue:9313)
  • :func:date_range now accepts 'Y' in addition to 'A' as an alias for end of year. (:issue:9313)
  • :func:DataFrame.add_prefix and :func:DataFrame.add_suffix now accept strings containing the '%' character. (:issue:17151)
  • Read/write methods that infer compression (:func:read_csv, :func:read_table, :func:read_pickle, and :meth:~DataFrame.to_pickle) can now infer from path-like objects, such as pathlib.Path. (:issue:17206)
  • :func:read_sas now recognizes much more of the most frequently used date (datetime) formats in SAS7BDAT files. (:issue:15871)
  • :func:DataFrame.items and :func:Series.items are now present in both Python 2 and 3 and is lazy in all cases. (:issue:13918, :issue:17213)
  • :meth:pandas.io.formats.style.Styler.where has been implemented as a convenience for :meth:pandas.io.formats.style.Styler.applymap. (:issue:17474)
  • :func:MultiIndex.is_monotonic_decreasing has been implemented. Previously returned False in all cases. (:issue:16554)
  • :func:read_excel raises ImportError with a better message if xlrd is not installed. (:issue:17613)
  • :meth:DataFrame.assign will preserve the original order of **kwargs for Python 3.6+ users instead of sorting the column names. (:issue:14207)
  • :func:Series.reindex, :func:DataFrame.reindex, :func:Index.get_indexer now support list-like argument for tolerance. (:issue:17367)

.. _whatsnew_0210.api_breaking:

Backwards incompatible API changes



.. _whatsnew_0210.api_breaking.deps:

Dependencies have increased minimum versions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We have updated our minimum supported versions of dependencies (:issue:`15206`, :issue:`15543`, :issue:`15214`).
If installed, we now require:

  +--------------+-----------------+----------+
  | Package      | Minimum Version | Required |
  +==============+=================+==========+
  | Numpy        | 1.9.0           |    X     |
  +--------------+-----------------+----------+
  | Matplotlib   | 1.4.3           |          |
  +--------------+-----------------+----------+
  | Scipy        | 0.14.0          |          |
  +--------------+-----------------+----------+
  | Bottleneck   | 1.0.0           |          |
  +--------------+-----------------+----------+

Additionally, support has been dropped for Python 3.4 (:issue:`15251`).

.. _whatsnew_0210.api_breaking.bottleneck:

Sum/Prod of all-NaN or empty Series/DataFrames is now consistently NaN
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames no longer depends on
whether `bottleneck &lt;http://berkeleyanalytics.com/bottleneck&gt;`__ is installed, and return value of ``sum`` and ``prod`` on an empty Series has changed (:issue:`9422`, :issue:`15507`).

Calling ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, will result in ``NaN``. See the :ref:`docs &lt;missing_data.numeric_sum&gt;`.

.. ipython:: python

  s = Series([np.nan])

Previously WITHOUT ``bottleneck`` installed:

.. code-block:: ipython

  In [2]: s.sum()
  Out[2]: np.nan

Previously WITH ``bottleneck``:

.. code-block:: ipython

  In [2]: s.sum()
  Out[2]: 0.0

New Behavior, without regard to the bottleneck installation:

.. ipython:: python

  s.sum()

Note that this also changes the sum of an empty ``Series``. Previously this always returned 0 regardless of a ``bottlenck`` installation:

.. code-block:: ipython

  In [1]: pd.Series([]).sum()
  Out[1]: 0

but for consistency with the all-NaN case, this was changed to return NaN as well:

.. ipython:: python

  pd.Series([]).sum()

.. _whatsnew_0210.api_breaking.loc:

Indexing with a list with missing labels is Deprecated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning ``NaN`` for missing labels.
This will now show a ``FutureWarning``. In the future this will raise a ``KeyError`` (:issue:`15747`).
This warning will trigger on a ``DataFrame`` or a ``Series`` for using ``.loc[]``  or ``[[]]`` when passing a list-of-labels with at least 1 missing label.
See the :ref:`deprecation docs &lt;indexing.deprecate_loc_reindex_listlike&gt;`.

.. ipython:: python

  s = pd.Series([1, 2, 3])
  s

Previous Behavior

.. code-block:: ipython

  In [4]: s.loc[[1, 2, 3]]
  Out[4]:
  1    2.0
  2    3.0
  3    NaN
  dtype: float64

Current Behavior

.. code-block:: ipython

  In [4]: s.loc[[1, 2, 3]]
  Passing list-likes to .loc or [] with any missing label will raise
  KeyError in the future, you can use .reindex() as an alternative.

  See the documentation here:
  http://pandas.pydata.org/pandas-docs/stable/indexing.htmldeprecate-loc-reindex-listlike

  Out[4]:
  1    2.0
  2    3.0
  3    NaN
  dtype: float64

The idiomatic way to achieve selecting potentially not-found elements is via ``.reindex()``

.. ipython:: python

 s.reindex([1, 2, 3])

S
codecov-io commented 6 years ago

Codecov Report

Merging #82 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #82   +/-   ##
=======================================
  Coverage   93.23%   93.23%           
=======================================
  Files           3        3           
  Lines         281      281           
=======================================
  Hits          262      262           
  Misses         19       19

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 69cc4a9...0358924. Read the comment docs.

pyup-bot commented 6 years ago

Closing this in favor of #83