bis-med-it / pysdmx

Your opinionated Python SDMX library
https://www.sdmx.io/tools/pysdmx/
Other
8 stars 2 forks source link

Implement the Dataset class #72

Open javihern98 opened 3 months ago

javihern98 commented 3 months ago

This is a review of the topics discussed that the Dataset class attributes must be:

@sosna and @stratosn proposed the following code, which was accepted by everyone and will be implemented after 1.0, as some changes in parsers and writers are required.

from dataclasses import dataclass
from datetime import datetime
from typing import Any, Generator, Optional, Sequence, Union

from pysdmx.model import MetadataReport, DataProvider, Schema

@dataclass
class _Component:
    id: str
    value: Any

@dataclass
class Dimension(_Component):
    pass

@dataclass
class DataAttribute(_Component):
    pass

@dataclass
class Measure(_Component):
    pass

@dataclass
class _Package:
    key: str  # Full key (cf. MEDAL) A.F.G.M.*
    dimensions: Sequence[Dimension]
    attributes: Optional[Sequence[DataAttribute]]
    name: Optional[str]
    metadata: Optional[Sequence[MetadataReport, str]]

@dataclass
class Observation(_Package):
    measures: Sequence[Measure]

@dataclass
class _ObsPackage(_Package):
    observations: Generator[Observation]
    obs_count: Optional[int]
    start_period: Optional[str]
    end_period: Optional[str]
    last_updated: Optional[datetime]

@dataclass
class Series(_ObsPackage):
    pass

@dataclass
class Group(_Package):
    pass

@dataclass
class Dataset(_ObsPackage):
    packages: Generator[Union[Group, Series, Observation]]
    provider: Optional[DataProvider]
    structure: Union[Schema, str]  # Schema or the SDMX URN of the structure

    @property
    def groups(self):  # A view on the packages of type Group
        return (p for p in self.packages if isinstance(p, Group))

    @property
    def series(self):  # A view on the packages of type Series
        return (p for p in self.packages if isinstance(p, Series))

@dataclass
class PandasDataset(Dataset):
    def to_pandas():
        pass
gabrielgellner commented 2 days ago

Do we think it might be possible to do this with Narwhals to make this dataframe agnostic? (I am a huge lover of pysdmx and moving all my sdmx code over to it, but also a huge polars user :))