Closed eitsupi closed 2 years ago
Do you think the behavior should be:
if (inherits(obj, "ArrowTabular")) {
obj <- as.data.frame(obj)
}
.vsc.view(obj)
Thanks for the quick reply. In general, tables handled by Apache Arrow may have a very large number of rows, so I think it may not be a good idea to display them after converting all rows of them to data.frame in the current implementation.
As an example, when I tried to read the Parquet file (6001215 rows x 16 columns) used in the following post and display it after converting it to a data.frame, it took too long time. (I can't wait)
https://arrow.apache.org/blog/2021/12/03/arrow-duckdb/ https://github.com/cwida/duckdb-data/releases/download/v1.0/lineitemsf1.snappy.parquet
Ideally, I feel it would be nice to have a display that shows a few lines and suggests that the rest of the lines are present, as dbplyr does.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
arrow::arrow_table(mtcars) |> arrow::to_duckdb()
#> # Source: table<arrow_001> [?? x 11]
#> # Database: duckdb_connection
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # … with more rows
Created on 2022-01-16 by the reprex package (v2.0.1)
The following will work faster because it reads only enough data for one page display, but the drawback is that the information that there may be more than 101 rows in the table will be lost.
if (inherits(obj, "ArrowTabular")) {
obj <- as.data.frame(utils::head(obj, 100))
}
.vsc.view(obj)
Compared to these methods, I thought it would be better to have simple print results displayed as it works faster without misunderstandings.
Since it takes time to display a huge table, even in a data.frame, it might be useful to add an option to limit the number of rows displayed to the viewer (pass to the utils::head()
's n
argument).
If we print a Arrow Table object, we can see the columns and types as follows.
Created on 2022-01-16 by the reprex package (v2.0.1)
However, if we run
View(arrow::arrow_table(mtcars))
on VSCode, the following view will be opened. It is not possible to decipher the structure of the Table from this view.I think this is a common behavior for R6 class objects, and I wonder if it would be better to show the content of the
print()
function when a R6 object is opened in a viewer. Or, do you think it would be worth implementing a feature specifically for the Arrow Table?