Use of NumPy 1.21 style `npt.NDArray` typing hints

hmacdope commented 2 years ago

Problem

With the merge of #3737 NumPy 1.20 is now our minimum supported NumPy version as per NEP29. However some open PRs (eg @hmacdope #3730 and @umak1106 #3746) are using the NumPy 1.21 style typing:

import numpy.typing as npt
def blah(arr: npt.NDArray)

This is not going to play nice with our CI or advertised supported versions so we will have to decide how to proceed.

See the release notes for NumPy 1.21 to see what npt.NDArray actually is, but long and the short of it is that it is an alias for

np.ndarray[Any, np.dtype[~Scalar]]

from looking at the NEP29 table the important dates are:

Jun 21, 2022	3.8+	1.20+
Jan 31, 2023	3.8+	1.21+
--	--	--

Some possible ideas to consider:

Roll back some of the typing to the plain old arr: np.ndarray
Embed the alias ourselves until we have NumPy 1.21 (bit hacky)
Make typing a future branch and postpone merging it until we have NumPy 1.21 underneath
postpone a release until we have NumPy 1.21 underneath (6 Months)
Break NEP29
Anything else?

Tagging @MDAnalysis/coredevs for input? This is probably also relevant to all GSOC and Outreachy people @MDAnalysis/gsoc-mentors (particularly for @umak1106's project).

tylerjereddy commented 2 years ago

For one data point, I grepped around the SciPy codebase a bit and found that we indeed don't use NDArray in any .py files, opting for np.ndarray as you note above, but we do use the fancy new stuff in the static type stubs like .pyi files. The latter must be safe then, since it would only be used by mypy and you can just require a newer NumPy for the static checks that leverage the stub files in CI.

I guess it depends what mentors think, but maybe we could adopt a similar compromise for the next 6 months or so.

tylerjereddy commented 2 years ago

I think it is common to use the stub files for things like type hinting Cython modules. That said, I believe it is also possible to use the static stub files alongside regular Python modules, though I've personally almost never seen the latter. A big exception is the Python standard library I believe--it is entirely typed by a separate stubbing project if I'm reading correctly. I suppose if we really want we could do that for regular Python modules as well, though that seems debatably quite complex vs. using inline typing for conventional Python modules.

So if it were a vote I'd probably favor using the slightly older syntax inline for now rather than producing a bunch of stub files for plain Python modules, but perhaps good to be clear that both may be options IIUC.

tylerjereddy commented 2 years ago

Ok, I stand corrected--pandas just merged the various third-party type stubbing projects and created their own official separate type stubbing library: https://github.com/pandas-dev/pandas-stubs

It cites this part of PEP561: https://peps.python.org/pep-0561/#stub-only-packages

In short:

For package maintainers wishing to ship stub files containing all of their type information, it is preferred that the .pyi stubs are alongside the corresponding .py files. However, the stubs can also be put in a separate package and distributed separately.

Anyway, lots of interesting options/flexibility there.

jbarnoud commented 2 years ago

As long as npt.NDArray does not provide ways of specifying shapes, I would just roll back to np.ndarray.

I did consider stub files but I am concerned new contributors will not realise they're there and older ones will occasionally forget about them. Already the documentation is not always up to date.

IAlibay commented 2 years ago

So my 2 cent here is that we'll be dropping 1.20 for 1.21 in exactly 6 months. As much as I want to do 3 monthly releases, we could wait the 6 month cycle.

Otherwise sticking to np.ndarray is fine, but we should accompany those entries with some kind of comment tag. That way we can quickly find & replace them all once we move to 1.21?

It's a bit cumbersome but maybe something like # typing: numpy comment before the method/class def?

hmacdope commented 2 years ago

I'm happy with using arr: np.ndarray for the meantime with the tag @IAlibay proposes. I will follow this approach in #3730

Shankhadeep2000 commented 2 years ago

Hi I am a beginner in open source can I contribute in this project. I got the skills of python, data science, machine learning. please help me with this..

orbeckst commented 2 years ago

@Shankhadeep2000 welcome to MDAnalysis. This particular issue is effectively part of an ongoing Outreachy project so I don’t think we’re looking for any additional help with type hinting. But there’s are many other issues that need help, as you can see when looking through the issue tracker. I suggest you join the developer list and our discord (see https://www.mdanalysis.org/#participating ) and start a discussion there.

Please also see the contributor guide https://userguide.mdanalysis.org/stable/contributing.html

MDAnalysis / mdanalysis

Use of NumPy 1.21 style `npt.NDArray` typing hints #3748

Problem