casact / chainladder-python

Actuarial reserving in Python
https://chainladder-python.readthedocs.io/en/latest/
Mozilla Public License 2.0
192 stars 71 forks source link

Adding annotations - triangle.py #522

Open genedan opened 6 months ago

genedan commented 6 months ago

I'm mostly adding things like comments and type hints. Also fixes that would make the code compliant with PEP guidelines.

MatthewCaseres commented 6 months ago
  1. Is there a way to avoid code like columns: str | list = None? If columns is str or list, how can it be None by default?
  2. What do the lists contain, strings?
kennethshsu commented 6 months ago

@genedan how about (origin: str or array-like = None, let's try to match pandas.DataFrame?

@MatthewCaseres is that actually an issue? The argument is not always used. Do you think that will cause confusion? I guess we could do (origin: str, array-like, or None, default None?

MatthewCaseres commented 6 months ago

@kennethshsu Some arguments don't describe themselves as optional, but since they have a default None they must be optional. I was having some trouble understanding what I'm supposed to pass into the constructor, like can I omit the origin? Can I pass a list of dictionaries into the origin?

genedan commented 6 months ago

According to PEP 484 we need some explicit annotation to indicate a parameter is optional:

https://stackoverflow.com/questions/72039810/python-3-10-optional-parameter-type-union-vs-none-default

I'm thinking we could do something like:

columns: Optional[str | list] = None

or

columns: str | list | None = None

I would prefer the former since it's more explicit that the parameter is optional.

kennethshsu commented 6 months ago

This looks really good, I like it a lot!

To @MatthewCaseres's point, I think we can do a better job with input validation. I'll add this to my backlog.

kennethshsu commented 6 months ago

On second look, I thought this is for the docstrings, not the actual declaration in the code. But good still!

genedan commented 6 months ago

What other types can be accepted for the data argument? The documentation has DataFrame, but we have a method _interchange_dataframe that allows for other types to be accepted that support the __dataframe__ protocol. So this implies that the triangle constructor is somewhat flexible.

So I was thinking the accepted type for data should be something like DataFrame, DataFrame-like but I'm not sure what Python types would be available for "DataFrame-like". Maybe we could use array-like but specify in the docstring that it needs to support the __dataframe__ protocol?

genedan commented 6 months ago

In the pandas source code, they annotate the input as DataFrameXchg which comes from a stripped down version of a DataFrame used for the conversion, so I'll go with that.


import pandas as pd
from pandas.core.interchange.dataframe_protocol import (
    Buffer,
    Column,
    ColumnNullType,
    DataFrame as DataFrameXchg,
    DtypeKind,
)