equinor / segyio

Fast Python library for SEGY files.
Other
469 stars 213 forks source link

Improved typing support #574

Open sebastianvitterso opened 1 month ago

sebastianvitterso commented 1 month ago

I'm using segyio for the first time, and Pyright isn't loving all the implicit None-typing in the package.

E.g. looking at the spec-class, it is a simple object class, initialized as so:

class spec(object):
    def __init__(self):
        self.iline = 189
        self.ilines = None
        self.xline = 193
        self.xlines = None
        self.offsets = [1]
        self.samples = None
        self.ext_headers = 0
        self.format = None
        self.sorting = None
        self.endian = 'big'

In general pythonic terms, this is fine. However, since many of the fields are initialized to None, and have no other typing, Pyright (and other static type checkers) warn that I'm not allowed to assign spec.sorting = segyio.TraceSortingFormat.CROSSLINE_SORTING because "Literal[1]" is incompatible with "None" (basically int is incompatible with "None").

In the case of spec, I believe a dataclass might be a good fit, but in general, I think the classes should be typed.

from dataclasses import dataclass
@dataclass
class spec:
    iline: int = 189
    ilines: int | None = None
    xline: int = 193
    xlines: int | None = None
    offsets: list = [1]
    samples: int | None = None
    ext_headers: int = 0
    format: str | None = None
    sorting: int | None = None
    endian: str = 'big'

test_spec1 = spec()
test_spec2 = spec2(
    offsets=[2],
    ext_headers=1,
    endian='lsb'
)

Another example is that where segyio.tools.cube calculates the shape of the cube grid, it depends on e.g. the line smps = len(f.samples). Looking at the typing provided (implicitly) by the package: image

So f.samples is typed as None, because it's initialized as that. These properties (ilines, xlines, offsets, samples, sorting) that depend on private fields are all implicitly annotated to the initialization value of their private counterparts, which is None. By rather explicitly typing the property functions, these could be typed correctly, like so:

import numpy.typing as npt

@property
def samples(self) -> npt.NDArray[np.int_]:
    """[...]"""
    return self._samples

This change would improve the usability of the package for new users, as autocomplete (from correct typing) is one of the most intuitive types of docs a package can provide.

Thanks!