benbovy / spherely

Manipulation and analysis of geometric objects on the sphere.
https://spherely.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
119 stars 8 forks source link

Add script to generate the stub file #44

Open gahjelle opened 2 months ago

gahjelle commented 2 months ago

This is a draft for a script that can generate the spherely.pyi stub file.

During the EuroSciPy 2024 sprint, I worked with @jorisvandenbossche on #43 . We realized that we need several different _VFunc_* types to cover the different overloaded arguments for functions going forward. We tried a few different approaches to avoid too much boilerplate/copy-paste code, including ways of creating dynamic types but didn't find anything that was well supported by the type checkers.

Instead, the following script can automatically generate code for such classes. It works by looking a # /// Begin types marker and lines defining which types should be added. For example, to create a type that takes 2 geography inputs, as well as an optional radius parameter of type float, we can add the specification line:

#     - n_in=2, radius=float

Running the script will then update the code in-place between the # /// Begin types and the corresponding # /// End types marker. Any existing edits between those markers will be overwritten, but the rest of the file is left alone.

gahjelle commented 2 months ago

I'm not sure what's the best location for the script. I ended up pulling it outside the src/ directory. But we could still move it in there? Or, we could run the script as part of CI, and maybe move it to a workflow directory instead? Currently, the output of the script isn't perfectly "blackened" code. We could probably add trailing commas to make black always keep the arguments on separate lines? Or run Black as part of the script/workflow to format def on one line where there's enough room.

benbovy commented 2 months ago

Thanks @gahjelle that is a good idea!

We tried a few different approaches to avoid too much boilerplate/copy-paste code, including ways of creating dynamic types but didn't find anything that was well supported by the type checkers.

Out of curiosity, did you try or consider using TypedDict and Unpack (PEP 692)? Mypy seems to support it but I'm not sure about the other type checkers (I'm not an expert in Python static typing). I think one limitation is relying on **kwargs instead of a direct / explicit signature for the optional parameters, but probably not a big deal since the signatures in the docstrings are generated by pybind11 and not from the type stubs.

I'm fine with either solution. If using a script, I'd slightly prefer using Python code for the specs instead of parsing comments (e.g., in Xarray generate_ops.py. The script can be part of the code ; I think that for now it is good enough to run it manually when needed (we don't need to re-generate the type stubs unless we change the specs) and then let black format the generated file via pre-commit.

gahjelle commented 2 months ago

Thanks @benbovy

No, we didn't think about or explore using a TypedDict for doing the typing. I guess that could work. The types would probably be slightly less specific as, for example, radius would be an allowed optional parameter on all (Nin2) types. But maybe that's acceptable to avoid having to generate the stub file?

I've done an update to the generator, so that we're listing the specs explicitly in Python code instead of parsing the comments in the .pyi file.

But I'm fine with abandoning this script and working with the TypedDict and **kwargs instead.

When you say "The script can be part of the code", do you mean that we should move it into the src/ folder? Or that it should be integrated into some other code file?

Thanks again for your feedback.

benbovy commented 2 months ago

The types would probably be slightly less specific as, for example, radius would be an allowed optional parameter on all (Nin2) types.

Maybe we could define optional parameter types as generic, e.g., something like this:

_NameType = TypeVar("_NameType", bound=str)
_ScalarReturnType = TypeVar("_ScalarReturnType", bound=Any)
_ArrayReturnDType = TypeVar("_ArrayReturnDType", bound=Any)
_KwargsType = TypeVar("_KwargsType", bound=TypedDict)

class _VFunc_Nin2_Nout1(Generic[_NameType, _ScalarReturnType, _ArrayReturnDType, _KwargsType]):
    ...
    @overload
    def __call__(
        self,
        a: Geography,
        b: Geography,
        **kwargs: Unpack[_KwargsType],
    ) -> _ScalarReturnType: ...
    ...

class DistanceKwargs(TypedDict):
    radius: float

distance: _VFunc_Nin2_Nout1[Literal["distance"], float, float, DistanceKwargs]

When you say "The script can be part of the code", do you mean that we should move it into the src/ folder?

Yes I think it is fine moving it there.

benbovy commented 2 weeks ago

I gave a try at implementing a solution based on TypedDict + Unpack (see #65) but didn't really succeed unfortunately. It looks like both Python typing and static type checkers are not ready for supporting generic keyword arguments this way (see https://github.com/python/typing/issues/1399).

I also haven't found any alternative solution based on typing that looks reasonably clean.

This PR is likely be the best way to handle keyword arguments of "universal functions" until there is better support for generic TypedDict + Unpack.