Typing support for numpy
arrays, compatible with pydantic
validation out-of-the-box.
CI/CD | |
PyPI | |
Dev |
numpy
is a widely used Python library for data science and numerical computation. With the Python typing system becoming increasingly popular, the need for proper ways to type numpy
arrays is rising. While the developers of numpy
are dedicated to making their own typing system for their library, development is slow and the currently available solutions do not utilize the possibilities of the Python typing system to their full extent.
This project aims to do two things:
numpy
arrays that works with most static type checkers, including shape typing.numpy
arrays with pydantic
.mypy
and pyright
.pydantic
models out-of-the-box! Validation includes:
numdantic
work and that you need to learn about - nothing more!numdantic
requires Python 3.11 or higher to work. To install numdantic
, simply run:
pip install numdantic
To get started, import the NDArray
type from the package. NDArray
takes two type parameters: a shape, and a numpy
scalar type. You can use it to annotate a variable as an array like this:
import numpy as np
from numdantic import NDArray
# annotating a 2D array
matrix: NDArray[tuple[int, int], np.int32] = np.random.rand(2, 2)
This variable is now typed as a 2D array, with its axes having unspecified length, and of dtype np.int32
. Static type checker such as mypy
will now be able to check if you are using the variable correctly. Both shape and dtype are a lot more flexible than shown here though; you can learn about specifying axis lengths and same-length-axes in the Concepts section below.
If you wish to use a numpy
array inside of a pydantic
base model, you can use NDArray
as an annotation for the corresponding field, and it will work out-of-the-box:
import numpy as np
from numdantic import NDArray
from pydantic import BaseModel
class MyModel(BaseModel):
matrix: NDArray[tuple[int, int], np.int32]
# this will pass validation
MyModel(matrix=np.array([[1, 2], [3, 4]], dtype=np.int32))
# this will raise a ValidationError due to wrong dimensions
MyModel(matrix=np.array([1, 2, 3, 4], dtype=np.int32))
# this will raise a ValidationError due to wrong dtype
MyModel(matrix=np.array([[1, 2], [3, 4]], dtype=np.float64))
Learn more about what else numdantic
can do below!
If you wish to annotate an array that has a specific dimensionality, but whose axes can have arbitrary length, you can type its shape using a tuple
of int
. The rationale behind this is that array shapes are tuples of integers, and therefore, they are typed as such. For example, you can specify 1D, 2D, and 3D arrays like this:
import numpy as np
from numdantic import NDArray
array1d: NDArray[tuple[int], np.int32] = ...
array2d: NDArray[tuple[int, int], np.int32] = ...
array3d: NDArray[tuple[int, int, int], np.int32] = ...
Type checkers and pydantic
will not verify the length of the axes. This means that both an array of shape (2, 2)
and (100, 100)
would pass validation as array2d
, but an array of shape (1, )
or (2, 2, 2)
would cause an error due to a mismatch in dimensions.
Often you will know what lengths your axes will have and want to make sure that these axes lengths are respected throughout your program. To type an axis with a specific length, you can use literal integers:
from typing import Literal as L
import numpy as np
from numdantic import NDArray
widescreen_image: NDArray[tuple[L[1080], L[720], L[4]], np.int64]
Type checkers will accept this notation. Moreover, when used inside of a pydantic
base model, this will ensure that the axes of the array are checked for their length; if any axis has a length that does not match its annotation, a ValidationError
will be raised accordingly.
[!NOTE]
There is one significant caveat to this approach: you cannot mix unspecified axes and axes of specific length, i.e. attempting to assign
widescreen_image
to a variable typed to have shapetuple[int, int, int]
does not work and will cause type checkers to report an error. Once you opt for literals in your shapes, you will have to commit to them. Learn more about this limitation and why it occurs in the section about Limitations.
When you have axes that carry specific meaning, you often want to give them a specific name or handle. This is often useful for two reasons:
Borrowing the classic example for such a scenario, let us assume you wish to annotate a frame from a video. You can do so as shown above if you know the exact screen resolution, but often you wish to keep the exact axis length unspecified to support multiple resolutions. You do, however, want to make sure that the data is ordered correctly, to avoid one developer ordering it like (width, height, RGBA)
and another as (height, width, RGBA)
, possibly leading to hard-to-track bugs.
You can achieve this in numdantic
by defining a NewType
based on int
and using it as an axis length:
from typing import NewType
import numpy as np
from numdantic import NDArray
# named axes
Width = NewType("Width", int)
Height = NewType("Height", int)
RGBAColor = NewType("RGBAColor", int)
# annotate frame
video_frame: NDArray[tuple[Width, Height, RGBAColor], np.int64]
This annotation will ensure that the axes are always ordered the correct way. For example, attempting to use video_frame
in a function that accepts an array typed as NDArray[tuple[Height, Width, RGBAColor], np.int64]
will cause type checkers to raise an error.
[!NOTE]
There is one significant caveat to this approach: you cannot mix unspecified axes and named axes, i.e. attempting to assign
video_frame
to a variable typed to have shapetuple[int, int, int]
does not work and will cause type checkers to report an error. Once you opt for named axes, you will have to commit to them. Learn more about this limitation and why it occurs in the section about Limitations.
Named axes have a secondary benefit that only comes into play when validating them with pydantic
: If you use two or more axes of the same name in a base model field annotation, pydantic
will check that they all have the same length:
from typing import NewType
import numpy as np
from numdantic import NDArray
# named axes
Side = NewType("Side", int)
class Square(BaseModel):
# both axes must have same length
vertices: NDArray[tuple[Side, Side], np.int32]
# this will work (shape is (2, 2))
Square(vertices=np.array([[1, 2], [1, 2]], dtype=np.int32))
# this will raise a ValidationError (shape is (2, 3))
Square(vertices=np.array([[1, 2, 3], [1, 2, 3]], dtype=np.int32))
It is possible and intended that you mix axes of unspecified length, axes of specific length, and named axes within the same array. This is what makes numdantic
flexible and useful. For example, here is how you could mix the different shape types in a single annotation:
from typing import NewType, Literal as L
import numpy as np
from numdantic import NDArray
SomeAxis = NewType("SomeAxis", int)
x: NDArray[tuple[int, SomeAxis, L[2], SomeAxis], np.int32]
For each of these axes, type checkers and pydantic
will apply the rules described above. In a pydantic
model this would mean that the second and fourth axes of x
must have the same length and the third axis must be of length 2.
If you do not know the dimensionality of your array or wish to type an array that can have arbitrary shape and dimensionality, you can use a tuple of indeterminate size, using the Python built-in ellipsis literal ...
:
import numpy as np
from numdantic import NDArray
any_shape: NDArray[tuple[int, ...], np.generic] # most generic array!
When used with pydantic
, the shape and dimensionality of such an array are never checked; shape validation is skipped entirely. The dtype of the array however is validated regularly.
If you have an array of which you do not know the dimensionality, but you do know the length every axis must have, you can similarly type an array that has a shape that consists only of an arbitrary number of literal integers:
from typing import Literal as L
import numpy as np
from numdantic import NDArray
n_gon: NDArray[tuple[L[2], ...], np.generic] # all axes must have length 2
When such an array is validated with pydantic
, all its axes must have the specified length, but the number of axes (i.e. its dimensionality) is not validated.
You can also use named axes this way. In this case, pydantic
will check that all axes have the same length as the first axis of the array.
[!NOTE]
Unfortunately, this shape type has the same limitation as the other shape types: it cannot be mixed with other shape types. For example, an array typed as having shape
tuple[int, int]
can not be assigned to a variable typed asNDArray[tuple[int, ...], np.generic]
. This is a serious limitation that is being worked on. See the section on Limitations to learn more.
Often, it is not crucial what precision your arrays dtype has. You might not care if your array has dtype int32
or int64
, just that it is an integer. Or perhaps you do not care about the dtype at all. For this purpose, numdantic
allows you to specify broader dtypes in your annotations, using the generic scalar types provided by numpy
:
import numpy as np
from numdantic import NDArray
from typing import Any
# an array that will accept any number
any_number: NDArray[tuple[int, int], np.number[Any]]
# an array that will accept any positive integer
pos_ints: NDArray[tuple[int, int], np.unsignedinteger[Any]]
See the numpy
documentation for scalar types for an overview over what scalar types numpy
offers.
[!IMPORTANT]
As generics, these scalar types must be supplied with a type parameter. This type parameter specifies the size of the type (for example 32 bit vs 64 bit). Type checkers actually convert all scalar types to one of these generics. For example,
numpy.int64
is converted intonumpy.signedinteger[numpy._typing._64Bit]
during type checking. In order to actually receive a type that is agnostic to the size of the dtype, it is required to useAny
as type parameter - hence the use ofAny
in the example above.[!NOTE]
Depending on your version of
numpy
, using these generic scalar types to create arrays may be deprecated. You will get a corresponding runtime warning if you use them to instantiate an array. As a result, you might also get such a warning when you use a scalar generic inside apydantic
validator and the validator attempts to create an array using the generic as dtype. In such a case, you can either suppress this warning, or choose another appropriate scalar dtype. The latter is recommended. In type annotations, the generic scalars are all fine and should cause no problems.
pydantic
can run its validation in two modes: strict and lax mode. Depending on the mode chosen, inputs of a wrong type may be cast to the expected type, if possible (lax mode), or always raise an exception (strict mode). numdantic
mirrors this behavior:
ValidationError
is raised. If you have specified only a generic dtype, the input must have a dtype that is a valid subtype of this generic type.ValidationError
will be raised. If you have specified a generic dtype, casting will be performed only if the input has a dtype that is not a subtype of that generic dtype. Casting is performed directly to the generic dtype, meaning the resulting dtype is system-dependent!Shapes are never cast. This is to avoid hard to track bugs caused by arrays being reshaped into shapes that do not cause runtime issues, but produce wrong results. If an array input has the wrong shape, it will always raise a ValidationError
.
You can specify in which mode to run validation the usual way, i.e. by setting the mode to strict
in a model, field, or globally using the pydantic
API.
When using NDArray
as a field type in a pydantic
base model, you can also pass it (nested) sequences, which are then turned into an array of the specified dtype and validated according to shape and dtype:
import numpy as np
from pydantic import BaseModel
from numdantic import NDArray
class MyModel(BaseModel):
array: NDArray[tuple[int, int], np.int64]
# create a model using a sequence
my_model = MyModel(array=[[1, 2], [3, 4]])
serialization = my_model.model_dump()
# check the output
assert isinstance(serialization["array"], np.ndarray) # passes
assert serialization["array"].dtype is np.dtype(np.int64) # passes
Due to how numpy
handles the shape and dtype of its ndarray
type, static type checkers will unfortunately not be able to detect a wrong shape or a wrong dtype in assignments. For example, the following code will not raise an error when running it through a type checker:
import numpy as np
from numdantic import NDArray
# This does not cause a type checking error!
x: NDArray[tuple[int, int], np.int32] = np.array([1, 2, 3], dtype=np.int32)
# This does not cause a type checking error either!
x: NDArray[tuple[int, int], np.int32] = np.array([[1, 2], [3, 4]], dtype=np.float32)
This is due to the fact that numpy
functions usually have return types annotations where the array shape is always Any
, and the dtype typically is np.generic
. Some numpy
functions might have type annotations that accurately represent at least the dtype of their return value, but this is by no means guaranteed. There is nothing much that can be done about this, until numpy
changes its own typing system to more accurately reflect the return values dtype and shape.
Until then, you will either have to pay close attention to assignments, implement your own typing stubs for numpy
, or write custom wrappers around numpy
functions which you can then annotate appropriately, using TypeVar
and TypeVarTuple
.
Fortunately, there is a saving grace: type checkers will detect mismatches in types further down the line, for example if you try to use an array typed with numdantic
as a 2D array in a function that is typed with numdantic
to only accept 3D arrays, a type checker will catch this mistake.
It is not possible to use named axes created with NewType
in places that are typed using int
:
from typing import NewType
import numpy as np
from numdantic import NDArray
# named axes
Width = NewType("Width", int)
Height = NewType("Height", int)
# annotate image
image: NDArray[tuple[Width, Height], np.int64] = np.random.rand(40, 20)
def transpose_image(
img: NDArray[tuple[int, int], np.int32]
) -> NDArray[tuple[int, int], np.int32]:
return img.transpose()
# this will raise an error when checked by a type checker!
transpose_image(image)
Similarly, you cannot mix shapes with literal integers such as tuple[Literal[2], Literal[2]]
with named axes or generic axes typed with int
. All these combinations will cause your type checker to issue an error.
This happens because the type parameter for shape in the numpy.ndarray
type is currently typed as invariant. As a result, type checkers are not able to accept subtypes of int
inside of a tuple
to be valid in place of actual int
types. There are efforts on the side of the numpy
developers to make this parameter covariant however, and simultaneously, numdantic
might provide its own solution in the future.
If you wish to enjoy the documentation benefit of named axes and can forgo the benefit of base models checking for axes of the same name having the same length, it is recommended to use type aliases in the meantime:
from typing import NewType, TypeAlias
import numpy as np
from numdantic import NDArray
# named axes using type aliases
width: TypeAlias = int
height: TypeAlias = int
# annotate image
image: NDArray[tuple[width, height], np.int64]
For Python 3.12+ it is of course recommended to instead use the new type alias syntax instead.
For axes of specific length, you can of course create similar type aliases of specific literals:
Width720: TypeAlias = Literal[720]
Currently, numdantic
does not allow using Python built-in types as dtype for array annotations. This is due to how numpy
types their ndarray
type: as dtype, they only allow subtypes of np.generic
. This is despite the fact that numpy
has, for quite some time now, also accepted built-in types such as int
or float
as dtypes. The reasoning probably is that these built-in types are converted into proper numpy
dtypes before an ndarray
is constructed.
In principle, this can be remedied by simply adding built-in types to the allowed dtypes in numdantic
and then ignoring the complaints that type checkers will have, but this could lead to unforeseen consequences and sort of defeats the whole point of type checking. Therefore, numdantic
accepts this limitation for now.
Here are some miscellaneous tips and tricks for using numdantic
:
scripts/compatible_dtypes.py
to get a handy table of dtype compatibility! You can even pipe the output into a valid .rst
file. The table shows what array dtypes (rows) can be assigned to what target dtypes (columns), and it also includes useful information on your platform which can give an insight into the implementation of numpy
dtypes on your machine.numpy
plug-in for mypy
. This plug-in will supply mypy
with implementation details of the different dtypes on your machine and makes type checking of dtypes behave more predictably.numdantic
is very rudimentary. For some projects, that might be just what you need, but if you find that numdantic
does not fulfill your requirements, you might wish to check out these alternatives. They are much more advanced, support other data structures like pandas data frames as well, and are well maintained. Note however that their typing system differ from that of numdantic
and therefore are not easily compatible.
numpydantic
together with nptyping
- Support for numpy
arrays, pandas
data frames, dask
arrays, hdf5
and zarr
. Typing is simple and includes shape typing using string literals. Includes JSON schema generation and proper JSON serialization for all data structures.pydantic-numpy
- Support for numpy
arrays, including loading from .npy and .npz files. Provides type factory for different shapes and dtypes.optype
- While this library does not provide support for pydantic
validation of numpy
arrays, it provides useful typing support for numpy
arrays by means of versatile protocols.numdantic
is looking for your help! If you find a bug, have a request for a new feature, or wish to contribute, follow these guidelines.
This should go without saying, but any interactions regarding numdantic
, public or private, with the developers, contributors, or users, should be respectful and polite. Any use of aggressive, hateful, sexist, racist, or otherwise derogatory or discriminating language will not be tolerated. Depending on the severity of the infraction, a warning may be issued first, but I reserve the right to block people from participating in the numdantic
community without warning for severe infractions or if warnings are not leading to a correction of behavior. Actions such as trolling, doxxing, threatening or insulting members of the community will result in an immediate ban from participating in these communities.
Be nice, please. It isn't that hard.
If you find a bug or something is not working as you would expect, open a bug report on the GitHub issues. Please make sure that your bug has not been reported before. If it has, join the conversation on the existing issue instead. When you open a new issue, make sure to provide a minimal example that is able to reproduce the bug on your machine. This makes fixing the bug much easier.
When you open a new bug report, use the bug report
issue template and fill out the form as best as you can.
If you have an idea for a new feature for numdantic
, you can submit a feature request on the GitHub issues using the feature request
template. Fill out the form with your idea and give it an expressive title.
If you wish to supply an implementation to a feature request or a bug report directly, you can do so by opening a pull request. You can additionally also provide code contributions for open issues that have the help wanted
label. Go to the issue and comment that you would like to provide an implementation. Write your code on a new branch on a fork of the numdantic
main repository, and when you are finished, create a pull request to the main repository. Your pull request will then be reviewed and you will receive feedback as soon as possible.
Please note that there are some requirements for your code contributions:
numdantic
follows a few code conventions. They are automatically enforced by pre-commit
. In order to use pre-commit
, clone the repository, and then run the following commands from the project root to get started:
pip install -e .[dev]
pre-commit install
All functions, methods, classes and exceptions must have a docstring, describing their use and all parameters. Single-line docstrings are sufficient for tests, but not source code. See the existing docstrings for examples of what a docstring for numdantic
should look like.
Your code must be covered by tests. numdantic
differentiates between unit tests that cover only internal code, and integration tests that also cover the integration with mympy
and pydantic
. Depending on your code, you may not be able to provide unit tests, since numdantic
is strongly integrated with pydantic
by design. Note that arrays are not patched in unit tests; you may use actual numpy
arrays for testing.
numdantic
uses conventional commits v1.0.0 with the angular commit types. The possible scopes depend on the commit type:
feat
, fix
, refactor
, and perf
commit types, scopes can be either validation
, typing
or scripts
, depending on what changed. For refactor
and perf
type commits, the scope can additionally also be tests
if the refactoring touches only test code. Note that fixes for tests belong to the test
commit type, not the fix
type!docs
commit types, the scope should be the name of the file changed. If you worked on multiple files, use one commit per file.ci
commit types either drop the scope or use actions
if you changed something about the GitHub actions.test
commit types, use either unit
or integration
, depending on which tests you changed. If you change both, split the changes into two commits. Note that fixes for tests also belong to the test
commit type.build
commit types, no scope is required. If your commit touches only the pyproject.toml
, you can optionally use pyproject
as the scope.Thank you for helping numdantic
improve! :heart: