functime-org / functime

Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
https://docs.functime.ai
Apache License 2.0
971 stars 52 forks source link

refactor: fix type hints of train_test_split #161

Closed baggiponte closed 3 months ago

baggiponte commented 6 months ago

What does this implement/fix? Explain your changes.

Fix the type annotations of train_test_split.

Note. Technically is a breaking change since I made eager a keyword only argument. If we plan to release in 0.10 we should mark it as such. Though I would assume that no one with benign intentions calls train_test_split(4, true).

vercel[bot] commented 6 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
functime-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 10, 2024 8:38am
FBruzzesi commented 6 months ago

Following @baggiponte comment, I took a quick look at this and left a comment for the return type in function docstring.

Actually let me add some additional comments since I am here 😂

baggiponte commented 5 months ago

Following @baggiponte comment, I took a quick look at this and left a comment for the return type in function docstring.

Actually let me add some additional comments since I am here 😂

  • The docstring for train_test_split says: """Return a time-ordered train set and test set given test_size.""" which is not accurate as it returns the callable.
  • As the callable is returned, I would add proper docstring to that as well so that one can inspect it if needed

Thanks.

I was wondering whether we should remove the eager arg anyway and have a simpler signature def train_test_split[DF: (pl.DataFrame, pl.LazyFrame)](data: DF, ...) -> Callable[[DF], tuple[DF, DF]]: .... No overloads to write.

baggiponte commented 5 months ago

@FBruzzesi I should merge this before you go on with #168. I wanted to discuss with other maintainers @ngriffiths13 whether we can get rid of the eager argument. The reasoning is: if we pass a DataFrame, we return a DataFrame. Same with LazyFrame. In this way, we remove code we have to annotate and test. The user can simply call lazy() or collect() before splitting, if they want.