astrojuanlu / fastunits

BSD 2-Clause "Simplified" License
11 stars 1 forks source link

fastunits

Documentation Status Code style: black PyPI

A fast physical units library compatible with NumPy

Installation

To install, run

(.venv) $ pip install fastunits

Overview

Add a longer description here.

Notes

Mathematical theory of dimensions

Angles as separate dimension

Units with offsets

Interoperability with NumPy

tl;dr: fastunits should provide its own compatibility layer on top of the Array API, to maintain a simple and hackable implementation that can work with many array containers.

The big question is: should np.add(q1, q2) return a fastunits Quantity object? And it turns out that this has lots of ramifications.

Thread in scientific-python about these topics

There are several methods currently available to extend or interface with NumPy, and proposals to add a few more. However, reading the original NumPy Enhancement Proposals (NEPs for short) is not always helpful to understand the historical evolution of a particular method: this information is instead captured in newer documents or GitHub threads that look back on past proposals and evaluate their adoption.

For example, the introduction of NEP 37 "A dispatch protocol for NumPy-like modules" (draft, 2019) mentions several drawbacks to NEP 18 __array_function__ protocol (final, 2018) that were only discovered after libraries tried to adopt it. One of them is backwards compatibility: NEP 18 says

There are no general requirements on the return value from __array_function__, although most sensible implementations should probably return array(s) with the same type as one of the function's arguments.

However, NEP 37 recollects

__array_function__ has significant implications for libraries that use it: [...] users expect NumPy functions like np.concatenate to return NumPy arrays. [...] Libraries like Dask and CuPy have looked at and accepted the backwards incompatibility impact of __array_function__; it would still have been better for them if that impact didn't exist".

This suggests that np.sum(q1, q2) returning an object that is not a NumPy array proved to be contentious or problematic for some downstream users. astropy.units should in principle not be affected by this problem since their Quantity objects inherit numpy.ndarray, however it would be useful to determine whether the incompatibilities of astropy.units with Dask and xarray summarized in https://github.com/astropy/astropy/issues/12600#issuecomment-1003044555 come from this design decision.

On the other hand, the unyt project described some challenges of implementing __array_function__ in https://github.com/yt-project/unyt/issues/139. In particular, unyt seems to be affected by what NEP 37 called "an all or nothing approach" or "no good pathway for incremental adoption", and in addition there is not an official listing of NumPy functions supporting __array_function__ as requested in https://github.com/numpy/numpy/issues/15544. All these challenges are not necessarily technical barriers, but increase the amount of work needed to properly adopt __array_function__.

The difficulties stated above seem a bit discouraging, and so maybe fastunits could experiment with an alternative approach: rather than treating Quantity objects like NumPy arrays (by establishing a direct inheritance relationship) or even like a provider of an Array API https://data-apis.org/array-api/latest/, fastunits could implement Quantity objects as containers of array types, hence becoming an "array consumer library" as depicted in https://github.com/data-apis/array-api/issues/1. If done properly, this should enable fastunits to work not only with NumPy arrays as data containers, but also with Dask arrays, CuPy arrays, or virtually any other library implementing the Python array API standard.

Going back to the original question (what should something like np.add(q1, q2) return?) we could imagine something like this:

def quantity_add(q1, q2):
    # Check that q1 and q2 are commensurable, and if so,
    unit = q1._unit

    # These values can be any array type!
    v1 = q1._value
    v2 = q2._value

    # Retrieve Array API namespace, see NEP 47
    xp = get_namespace(v1, v2)

    # Apply target function from loaded namespace
    v_add = xp.add(v1, v2)

    # Create new Quantity from that
    return Quantity(v_add, unit)

The disadvantage is that users would not be able to just call np.add(q1, q2) and receive a fastunits Quantity, forcing them to do something like

import fastunits.numpy_compat as funp

q = funp.sum(q1, q2)

In return, we would get these advantages:

Would users of Dask, CuPy and the like be satisfied by this solution? It can be argued that they have no other option, since the existing physical unit libraries are not extensible enough. On the other hand, maybe there exists some better solution that could be achieved by a better design, but at the moment that lives on the realm of "unknown unknowns".

As it happened with NumPy __array_function__ and others, there is only one way to know: putting the code on the users hands.