JakobGM / patito

A data modelling layer built on top of polars and pydantic
MIT License
270 stars 23 forks source link

feature: recursive DataFrame.derive() #25

Closed brendancooley closed 5 months ago

brendancooley commented 11 months ago
class Foo(pt.Model):
    bar: int = pt.Field(derived_from="foo")
    quad_bar: int = pt.Field(derived_from=2 * pl.col("double_bar"))
    double_bar: int = pt.Field(derived_from=2 * pl.col("bar"))

will derive double_bar first, then quad_bar, automatically detecting upstream dependencies and ensuring that execution order is correct. The data frame will be returned in the order specified by the model (quad_bar comes before double_bar).

Added new test_recursive_dependencies. Tests will pass for both polars==0.18 and polars==0.19 after merging #24.

thomasaarholt commented 5 months ago

I really like the idea and understand the need for this. Any chance you'd like to rebase this on latest main branch?

brendancooley commented 5 months ago

I really like the idea and understand the need for this. Any chance you'd like to rebase this on latest main branch?

Got it in there on the pydantic v2 refactor! See test_recursive_derive: https://github.com/JakobGM/patito/blob/89e59f313ec2452fe76a1e72aff2e157f6fac298/tests/test_polars.py#L284

Will close this.