-
As of now the documentation for the function http://spark.rstudio.com/reference/spark_write_parquet.html does not make it clear what are the acceptable values for the mode parameter and how to use the…
-
**Describe the bug**
Using dataframe.repartition() function doesn't work as expected.
**To Reproduce**
Using the tpch bin from benchmarks, convert the .tbl (csv) files to Parquet format using t…
-
Hi,
When computing `connectedComponents`using the graphframes algorithm I get the following error:
```
File "/root/.ivy2/jars/graphframes_graphframes-0.5.0-spark2.1-s_2.11.jar/graphframes/graphfr…
-
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
Currently, we can only read parquet files from local file system. It would be nice to add s…
-
> A clear and concise description of what is the intended usage scenario is.
full_sorting_merge is a very effective way to merge two large tables if it's known that tables are already sorted.
But…
-
### Description
As per the streaming API documentation (https://pola-rs.github.io/polars/user-guide/concepts/streaming/#when-is-streaming-available) streaming now supports: scan_csv, scan_parquet, sc…
-
Hi,
I have following issue.
I have record where one of fields has time-millis logicalType.
When saving to Avro format, column in resulting table has TIMESTAMP type.
Unfortunately when saving to Pa…
-
Hi, I had a problem loading parquet file from s3 when there's a space in the path. I tried with `%20` but it doesn't work either.
Example path:
```
s3://my-bucket/trusted/receita_socios/version…
-
### Problem description
scan_parquet today doesn't support adding the partition columns for directory-partitioned parquet (and CSV). Without this, the user has to work out the logic of adding columns…
-
### Description
Currently the Parquet reader does not seem to support TIMESTAMP data type. We ran into exception at [here](https://github.com/facebookincubator/velox/blob/main/velox/dwio/parquet/read…