data-apis / python-record-api

Inferring Python API signatures from tracing usage.
MIT License
75 stars 6 forks source link

ufunc data seems to be missing #144

Open rgommers opened 3 years ago

rgommers commented 3 years ago

I was looking for def sin and other such functions in typing/numpy.py, and they're missing completely. It's unclear why.

The actual question I was trying to figure out is: how often is the dtype keyword used for unary ufuncs. I thought the data I needed would be here, but it looks like it's not.

kgryte commented 3 years ago

Does this have to do with the Python-C bridge? Meaning, I am not sure that the tooling currently picks up C-level argument handling, which could be applicable for ufuncs.

rgommers commented 3 years ago

Ah yes, that's it (unfortunately). Pretty much all functions that are not ufuncs have a thin Python shim and will be picked up, ufuncs aren't.

saulshanabrook commented 3 years ago

The ufunc data is available, see in the file you referenced:
 

# usage.dask: 58
# usage.hvplot: 1
# usage.koalas: 5
# usage.matplotlib: 127
# usage.networkx: 5
# usage.orange3: 6
# usage.pandas: 34
# usage.prophet: 2
# usage.scipy: 296
# usage.seaborn: 1
# usage.skimage: 51
# usage.sklearn: 32
# usage.statsmodels: 47
# usage.xarray: 30
sin: numpy.ufunc

Since ufuncs are a custom object, not just a function, we record them as such. If you then look at the same file, you will see a class ufunc which has the overloads for all the calls:

class ufunc:

    # usage.dask: 1
    __module__: ClassVar[object]

    @overload
    def __call__(self, _0: pandas.core.frame.DataFrame, _1: int, /):
        """
        usage.dask: 85
        usage.koalas: 24
        """
        ...

    @overload
    def __call__(self, _0: int, _1: int, /):
        """
        usage.dask: 1
        usage.koalas: 1
        usage.matplotlib: 2
        usage.scipy: 135
        usage.skimage: 1
        usage.sklearn: 3
        usage.statsmodels: 10
        usage.xarray: 4
        """
        ...

So it is currently showing the number of times each ufunc is retrieved from the ufunc module (the first stats) and then also how ufuncs are called generally (the second stats).

We could also show the product of these, showing per ufunc instance how it's called.

Currently, ufuncs all show up as defined in the numpy module, because it's hard to find where they were defined (https://github.com/data-apis/python-record-api/issues/70)