-
so i want to know why there will be some parquet files which the file name start with dot(.) when I write data to hudi? And how to filter these files when I read hudi by spark? Thank you very much!
-
### Describe the enhancement requested
I'm seeing some PRs start to prefer the nested namespace notation. I know clang-tidy prefers it. It is more compact and now that we are on C++ 17 I don't se…
-
The following minimal example results in an error:
```
from pyspark.sql.functions import col
from datetime import date
import random
source_data = []
for i in range(100):
source_data.ap…
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.
### Version
Doris Version: doris-2.0.2-rc05-ae923f7
…
-
### Description
Previously, field separators were added as parameter of Subfield with https://github.com/facebookincubator/velox/pull/6014. The only usage of Subfield in Gluten is to create SubfieldF…
-
We use the settings about batch size like below confs:
```
spark.sql.inMemoryColumnarStorage.batchSize 32768
spark.sql.execution.arrow.maxRecordsPerBatch 32768
spark.sql.parquet.columna…
-
Is it not possible to use standard dplyr operations within a spark_apply function?
Please let me know if below example works for anyone else or if there's an obvious mistake in my code.
Thanks!
…
-
## Your Environment
Circle CI job failure: https://github.com/prestodb/presto/actions/runs/6200486316/job/16835270734?pr=20381#logs
## Expected Behavior
## Current Behavior
```
ORDE…
-
First i want to thank you for this great library!
I need to merge hundreds of small parquet files into bigger ones. Sadly they are not all the same schema (e.g. missing columns), nor is the schema …
-
My cluster has three hosts, a server, a locator, a lead, locator configuration port is 9998, running my code, sometimes connected, sometimes stuck fixed, no error message:
My code:
```
object Smart…