data-apis / array-api

RFC document, tooling and other content related to the array API standard
https://data-apis.github.io/array-api/latest/
MIT License
212 stars 44 forks source link

RFC: tracking array API compliance #402

Open kgryte opened 2 years ago

kgryte commented 2 years ago

This RFC seeks to propose a means for tracking array API compliance.

Overview

Currently, consumers of array libraries lack a centralized mechanism for determining whether any given array API is compliant with the array API specification.

Array libraries have implemented various means for tracking implementation progress:

However, surfacing this information to understand how broadly an API is supported and in what version any given API was implemented requires knowing where to look, a significant investment of time and energy, and dogged investigation.

A significant barrier to specification adoption among downstream libraries is not knowing (a) what libraries currently implement any given API and (b) which array library versions are needed in order to access specification-compliant APIs.

This RFC seeks to address this barrier by providing a process for tracking array API specification compliance and making this information publicly available.

Proposal

This RFC proposes an approach similar to that of Web APIs whereby compatibility information is stored in JSON files and made publicly available on the web.

An example JSON file for the `asarray` API. ```json { "asarray": { "__compat__": { "spec_url": "https://data-apis.org/array-api/latest/API_specification/generated/signatures.creation_functions.asarray.html#signatures.creation_functions.asarray", "support": { "cupy": [ { "version_added": "10.0.0", "status": { "experimental": false, "deprecated": false, "partial_implementation": true }, "notes": "`copy` kwarg is only partially implemented." } ], "dask": null, "jax": null, "mxnet": null, "numpy": [ { "version_added": "1.22.0", "status": { "experimental": true, "deprecated": false, "partial_implementation": true }, "notes": "Provisionally available via `numpy.array_api`." } ], "pytorch": [ { "version_added": "1.11.0", "status":{ "experimental": false, "deprecated": false, "partial_implementation": false }, "notes": "" } ], "tensorflow": null }, "status": { "standard_track": true, "experimental": false, "deprecated": false } }, "dtype": { "__compat__": { "support": { "cupy": [ { "version_added": "10.0.0", "status": { "experimental": false, "deprecated": false, "partial_implementation": false }, "notes": "" } ], "dask": null, "jax": null, "mxnet": null, "numpy": [ { "version_added": "1.22.0", "status": { "experimental": false, "deprecated": false, "partial_implementation": false }, "notes": "" } ], "pytorch": [ { "version_added": "1.11.0", "status":{ "experimental": false, "deprecated": false, "partial_implementation": false }, "notes": "" } ], "tensorflow": null } }, "status": { "standard_track": true, "experimental": false, "deprecated": false } }, "device": { "__compat__": { "support": { "cupy": [ { "version_added": "10.0.0", "status": { "experimental": false, "deprecated": false, "partial_implementation": false }, "notes": "" } ], "dask": null, "jax": null, "mxnet": null, "numpy": [ { "version_added": "1.22.0", "status": { "experimental": false, "deprecated": false, "partial_implementation": false }, "notes": "" } ], "pytorch": [ { "version_added": "1.11.0", "status":{ "experimental": false, "deprecated": false, "partial_implementation": false }, "notes": "" } ], "tensorflow": null } }, "status": { "standard_track": true, "experimental": false, "deprecated": false } }, "copy": { "__compat__": { "support": { "cupy": [ { "version_added": "10.0.0", "status": { "experimental": false, "deprecated": false, "partial_implementation": true }, "notes": "`copy=False` is not implemented" } ], "dask": null, "jax": null, "mxnet": null, "numpy": [ { "version_added": "1.22.0", "status": { "experimental": false, "deprecated": false, "partial_implementation": true }, "notes": "`copy=False` is not implemented" } ], "pytorch": [ { "version_added": "1.11.0", "status":{ "experimental": false, "deprecated": false, "partial_implementation": false }, "notes": "" } ], "tensorflow": null } }, "status": { "standard_track": true, "experimental": false, "deprecated": false } } } } ```

At a high level, for each API in the array API specification, there would be a corresponding JSON file containing compatibility data for each array library of interest.

{
    "<api>": {
        "__compat__": {
            ...,
            "support": {
                "cupy": [...],
                "dask": null,
                "jax": null,
                "mxnet": null,
                "numpy": [...],
                "pytorch": [...],
                "tensorflow": null,
                ...
            },
            "status": {
                "standard_track": true,
                "experimental": false,
                "deprecated": false
            }
        },
        ...
    }
}

The status field indicates whether an API is either on a standards track, is experimental and thus subject to change, or deprecated.

The status field is an object as the contained fields are not mutually exclusive (e.g., an experimental API could be deprecated after failing to gain sufficient traction during the specification process, or a standards track API could be deprecated due to obsolescence and replacement by a new API).

The support field maps array libraries to an implementation status. If an array library lacks even partial support, its corresponding field value is null.

For array libraries with partial or full support, the corresponding field value would be an array of objects having the following fields:

The version_added and version_removed fields are mutually exclusive.

As an example, consider the following compliance data for CuPy and asarray.

    ...,
    "cupy": [
        {
            "version_added": "10.0.0",
            "status": {
                "experimental": false,
                "deprecated": false,
                "partial_implementation": true
            },
            "notes": "`copy` kwarg is only partially implemented."
        }
    ],
    ...

The above indicates that the asarray API was implemented in CuPy starting in version 10.0.0, is not exposed on an experimental status, and is only partially implemented. The notes clarify that the partial implementation status is due to the copy kwarg having incomplete support.

Suppose CuPy subsequently adds complete support for the copy kwarg in a subsequent version. In which case, the compliance data would be updated as follows:

    ...,
    "cupy": [
        {
            "version_added": "10.1.0",
            "status": {
                "experimental": false,
                "deprecated": false,
                "partial_implementation": false
            },
            "notes": ""
        },
        {
            "version_added": "10.0.0",
            "status": {
                "experimental": false,
                "deprecated": false,
                "partial_implementation": true
            },
            "notes": "`copy` kwarg is only partially implemented."
        }
    ],
    ...

Notice that the partial_implementation flag and the clarifying notes have been removed. By storing the data in an array, we are able to track implementation progress over time.

In addition to total API compliance, this RFC proposes to break out support for each optional argument. Using asarray as an example,

{
    "asarray": {
        "__compat__": {...},
        ...,
        "copy": {
            "__compat__": {
                "support": {
                    "cupy": [
                        {
                            "version_added": "10.0.0",
                            "status": {
                                "experimental": false,
                                "deprecated": false,
                                "partial_implementation": true
                            },
                            "notes": "`copy=False` is not implemented"
                        }
                    ],
                    ...
                }
            },
            "status": {
                "standard_track": true,
                "experimental": false,
                "deprecated": false
            }
        }
    }
}

Compliance for optional arguments follows a similar structure as total API compliance. Namely, a special __compat__ field containing compliance data and a status field indicating the status of the API at the standards level.

Updating Compliance Data

Array library maintainers are best positioned to know both (a) when an API is implemented and (b) to what extent an API is compliant. Accordingly, array libraries should plan to dedicate a small amount of time updating compliance status for each release.

In the future, we can investigate automating this process. For example, array libraries could include compliance data in their release notes in a machine readable format which we can then use to generate automatic updates.

However, in the absence of such automation, this RFC proposes to rely on maintainers and crowdsourcing for ensuring that compliance data is up-to-date.

This RFC proposes that compliance data be stored in a standalone public Git repository against which contributors (including those outside of the Consortium) may open pull requests fixing or updating compliance entries.

Public Consumption

This RFC proposes to surface compliance data in a human-friendly manner by publishing this data directly in the publicly hosted specification.

The specification for each API should contain a table similar to the following:

Screen Shot 2022-03-09 at 11 38 56 PM

In this example table, an individual is able to immediately infer how widely an API is implemented and to what extent implementations are specification-compliant.

For example, we can see that the asarray API is available in NumPy under an experimental status and has only partial support for the copy kwarg starting in version 1.22.0. CuPy has similar compliance; however, the API is not exposed experimentally. PyTorch has full compliance starting in version 1.11.0. All other libraries currently do not have stable releases exposing a specification-compliant asarray.

Questions

  1. Should we be collecting any additional data (e.g., device support)?
  2. Are we okay with the proposed data format and process?
  3. Will array libraries commit to helping keep compliance data up-to-date?
leofang commented 2 years ago

For all array library implementors: We (NVIDIA) are interested in collecting the pain points in using CUDA math libraries to support the array API functions. We have collected responses from CuPy/PyTorch: https://docs.google.com/spreadsheets/d/15GqZ_QpVGSxZ8UqDkERWHb23MzfrTafeuf_oDAUUH08/edit?usp=sharing Each library can clone the existing tabs to start.