Closed baggiponte closed 3 months ago
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
functime-docs | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Jan 10, 2024 8:38am |
Following @baggiponte comment, I took a quick look at this and left a comment for the return type in function docstring.
Actually let me add some additional comments since I am here 😂
train_test_split
says: """Return a time-ordered train set and test set given test_size
.""" which is not accurate as it returns the callable.Following @baggiponte comment, I took a quick look at this and left a comment for the return type in function docstring.
Actually let me add some additional comments since I am here 😂
- The docstring for
train_test_split
says: """Return a time-ordered train set and test set giventest_size
.""" which is not accurate as it returns the callable.- As the callable is returned, I would add proper docstring to that as well so that one can inspect it if needed
Thanks.
I was wondering whether we should remove the eager
arg anyway and have a simpler signature def train_test_split[DF: (pl.DataFrame, pl.LazyFrame)](data: DF, ...) -> Callable[[DF], tuple[DF, DF]]: ...
. No overloads to write.
@FBruzzesi I should merge this before you go on with #168. I wanted to discuss with other maintainers @ngriffiths13 whether we can get rid of the eager
argument. The reasoning is: if we pass a DataFrame, we return a DataFrame. Same with LazyFrame. In this way, we remove code we have to annotate and test. The user can simply call lazy()
or collect()
before splitting, if they want.
What does this implement/fix? Explain your changes.
Fix the type annotations of train_test_split.
Note. Technically is a breaking change since I made
eager
a keyword only argument. If we plan to release in 0.10 we should mark it as such. Though I would assume that no one with benign intentions callstrain_test_split(4, true)
.