-
I found this project when trying to compare dataframes using pyspark, and it works appears to work great. I am seeing an issue when running this as part of an AWS Glue job with this jar - spark-exten…
-
**Motivation: describe the problem to be solved**
Real world use cases have large data sets that can not fit in memory. Doing performance estimation on such datasets is not possible with current i…
-
I am trying to do a `MultiClassifierDLApproach` to train a Multi-Label Multi-Class model but it seems to always end in an error when I try to use the GPU in the public Google CoLab environment…
-
Hi, I am on SparkDP nightly (as i wanted to query hive).
I am not able to convert sparkdp dataframes to ray datasets. Have this error even for simple ones.
for example:
```
df1 = spark.ran…
-
I'd like to be able to convert data representing time since UNIX epoch to explicit timestamps format with `to_timestamp`, like I can in Spark SQL and PosgreSQL.
```python
from pyspark.sql import S…
-
Data skewness is very large for the spatial join from a couple of kb to MB is there something I can do to get more even partitions? Rtre for indexing and kdBtree for partitioning are used
![image](ht…
-
**Describe the problem you faced**
With hudi 0.9, if I load a number of dataframes and then loop over them and write them using the hudi's Spark datasource writer, I can see the embedded timeline ser…
-
# Main error
Classpath problems?
`Error : java.lang.ClassNotFoundException: ai.h2o.sparkling.H2OConf`
### Documentation error (I guess)
I think this documentation shows an old way of doing…
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/ray-project/ray/issues) and found no similar feature requirement.
### Description
Ray dataset uses Arrow as data fo…
-
A clear and concise description of the problem.
using below configs as mentioned in document we are writing to hudi tables multiple dataframes concurrently using the `concurrent.futures.ProcessPoolEx…