Quansight-Labs / numpy_pytorch_interop

19 stars 4 forks source link

NumPy API surface : plan/prioritize the coverage #87

Open ev-br opened 1 year ago

ev-br commented 1 year ago

EDIT: the relevant list for an MVP is https://github.com/Quansight-Labs/numpy_pytorch_interop/issues/87#issuecomment-1478036400. The rest is maybe-some-day-if-need-arises.

Splitting it off gh-86, here's the difference in API surfaces of NumPy and this wrapper. We can edit the order to reflect priorities:

>>> import numpy as np
>>> import torch_np as tnp
>>> npset = set(x for x in dir(np) if not x.startswith('_') and not inspect.ismodule(x) and not x[0].isupper())
>>> tnpset = set(dir(tnp))
>>> for name in sorted(npset - tnpset):
...        print("-[ ]", name)

EDIT: now lightly edited:

Lower prio:

memmap ndenumerate ndindex nditer nested_iters setxor1d setdiff1d vectorize trapz trim_zeros version sort_complex flatiter union1d unpackbits packbits

Low prio if at all:

array2string array_repr array_str busday_count busday_offset busdaycalendar bytebounds bytes cast ctypeslib deprecate deprecate_with_doc format_float_positional format_float_scientific format_parser get_array_wrap get_include get_printoptions getbufsize geterr geterrcall geterrobj obj2sctype poly poly1d polyadd polyder polydiv polyfit polyint polymul polynomial polysub polyval sctype2char sctypeDict seterr seterrcall seterrobj set_numeric_ops set_printoptions set_string_function setbufsize shares_memory source tracemalloc_domain (?) test testing use_hugepage who save savetxt savez savez_compressed show_config show_runtime frombuffer fromfile fromfunction fromiter frompyfunc fromregex fromstring genfromtxt base_repr binary_repr may_share_memory broadcast printoptions issctype issubsctype require lookfor load loadtxt mask_indices kernel_version lexsort little_endian maximum_sctype intersect1d

Definitely not (no pytorch equivalents):

add_docstring add_newdoc add_newdoc_ufunc asmatrix char character chararray clongdouble clongfloat complex256 compare_chararrays datetime64 datetime_as_string datetime_data flexible float128 isnat isbusday longcomplex longdouble longfloat matrix rec recarray recfromcsv recfromtxt record ushort uint uint16 uint32 uint64 uintc uintp ulonglong unicode void spacing str string timedelta64 safeeval numarray oldnumeric object fastCopyAndTranspose (deprecated in numpy) msort (deprecated in numpy) disp info iterable

ev-br commented 1 year ago
>>> for name in sorted(set(dir(np.ndarray)) - set(dir(tnp.ndarray))):
 ...     print("- [ ]", name)

Later if at all:

base byteswap newbyteorder flat getfield setfield setflags itemset tostring [deprecated since numpy 1.19] tobytes tofile ctypes __array__ __array_finalize__ __array_function__ __array_interface__ __array_prepare__ __array_priority__ __array_struct__ __array_ufunc__ __array_wrap__ __class_getitem__

ev-br commented 1 year ago

For tnp.random:

>>> for name in sorted(dir(tnp.random)):
...     print("- [x]", name)
>>> for name in sorted(set(dir(np.random)) - set(dir(tnp.random))):
...     print(name)

BitGenerator Generator MT19937 PCG64 PCG64DXSM Philox RandomState SFC64 SeedSequence __RandomState_ctor __path__ _bounded_integers _common _generator _mt19937 _pcg64 _philox _pickle _sfc64 beta binomial bit_generator bytes chisquare default_rng dirichlet exponential f gamma geometric get_bit_generator get_state gumbel hypergeometric laplace logistic lognormal logseries mtrand multinomial multivariate_normal negative_binomial noncentral_chisquare noncentral_f pareto permutation poisson power random_integers ranf rayleigh set_bit_generator set_state standard_cauchy standard_exponential standard_gamma standard_normal standard_t test triangular vonmises wald weibull zipf

ev-br commented 1 year ago

And linalg:

>>> for name in [x for x in dir(np.linalg) if not x.startswith("_")]:
 ...     print("- [ ]", name)
rgommers commented 1 year ago

This looks quite good, thanks @ev-br. Here's where I would start:

ev-br commented 1 year ago

For random, we can rather easily mock up RandomState and default_rng. Might make it easier for scikit-learn and others who frown on np.random.random usage.

lezcano commented 1 year ago

I agree with the list from Ralf. Here's a slightly more comprehensive list of things that either have PyTorch equivs or are close to trivial to implement.

These could go into a PR, as most of them should have a very simple implementation. After these, I think we can declare victory on the coverage end and we should spend some time finishing the refactorisation and doing general cleanups across the codebase (without spending too much time on this) and then move on to the testing part of the project, where we show that what we built, in fact, works.

rgommers commented 1 year ago

That list seems reasonable, minus infty - I'm going to deprecate that one in NumPy soon, so would prefer to leave it out here.