Open etiennebacher opened 2 months ago
Related issue: apache/arrow#38456
Personally, I think it would be fine to have more R-like style functions like read_parquet_polars(path, ..., as_data_frame = FALSE)
in the polars package.
This would be similar to, for example, Python Polars having something like the polars.DataFrame.pipe
method to make method chaining work in Python.
Personally, I think it would be fine to have more R-like style functions like read_parquet_polars(path, ..., as_data_frame = FALSE) in the polars package.
Why should it be in polars
? There are already functions to import and export data there so I don't see why we should duplicate those
Why should it be in
polars
? There are already functions to import and export data there so I don't see why we should duplicate those
Of course, it doesn't have to be present, but the mere sugar syntax is present in Python Polars.
Also, as for write_*
, I think the incompatibility of the pipe |>
and the $
operator reinforces the need for it to exist as a function.
e.g. we should write like pl$DataFrame(...)$some_methods(...) |> some_function(...) |> (\(x) x$write_parquet(...))()
the incompatibility of the pipe |> and the $ operator reinforces the need for it to exist as a function.
Of note, in R4.3 (and probably 4.2, I am not sure) the native placeholder _
works with the $
> women |> _$weight
[1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
If we introduce this kind of functions in polars
itself, then we'd have two kind of syntax for the same thing, e.g pl$read_parquet()
and read_parquet_polars()
. Wouldn't that lead to confusion, similarly as in the arrow issue you linked above?
Of note, in R4.3 (and probably 4.2, I am not sure) the native placeholder
_
works with the$
I think this is not the case in this case. x |> _$foo()
is not allowed.
Wouldn't that lead to confusion, similarly as in the arrow issue you linked above?
The problem with the arrow package is that the function names are inconsistent.
In other words, there are only read_parquet
and read_csv_arrow
instead of read_csv
and read_paquet_arrow
.
So far I only exported
sink_*
functions because they don't risk namespace collision with other packages, while exportingwrite_parquet()
orread_parquet()
would conflict witharrow
for example.However, some users do not know the existence of
pl$read_parquet()
andpl$scan_parquet()
, and therefore usearrow::read_parquet()
andas_polars_df()
which is not efficient at all. The goal oftidypolars
is to replace the somewhat confusing (to R users) syntax ofpolars
so that they don't have to deal withpl$
for instance. Therefore, I shouldn't expect them to usepl$scan_parquet()
.The easy solution would be to add the "_polars" suffix for read/write functions (and potentially sink and scan for consistency?), so I would export
read_parquet_polars()
for instance.duckplyr
hasduckplyr_df_from_parquet()
, so one option would be to exportpolars_df_from_parquet()
andpolars_lf_from_parquet()
instead of read and scan.Edit: not a big fan of
polars_lf_from_parquet()
because I like seeing all the options in the autocompletion when I type "write"