-
UPDATE: closed in favor of https://github.com/dbt-labs/dbt-redshift/issues/204
### Is this your first time submitting a feature request?
- [X] I have read the [expectations for open source contr…
-
Issue writing to AWS S3 via the aws-java-sdk in spark context
## Describe the bug
For a given DataFrame df in a PySpark env, the operation `df.write.parquet("s3a://some-bucket/test.parquet")` star…
-
Hi
Thanks for this awesome lib!
Hey, looking for some guidance on an issue I'm having
I'm trying to compare two dataframes for equality. It's not a requirement to know what's different jus…
-
I found this project when trying to compare dataframes using pyspark, and it works appears to work great. I am seeing an issue when running this as part of an AWS Glue job with this jar - spark-exten…
-
Data skewness is very large for the spatial join from a couple of kb to MB is there something I can do to get more even partitions? Rtre for indexing and kdBtree for partitioning are used
![image](ht…
-
# Main error
Classpath problems?
`Error : java.lang.ClassNotFoundException: ai.h2o.sparkling.H2OConf`
### Documentation error (I guess)
I think this documentation shows an old way of doing…
-
Hi, I am on SparkDP nightly (as i wanted to query hive).
I am not able to convert sparkdp dataframes to ray datasets. Have this error even for simple ones.
for example:
```
df1 = spark.ran…
-
**Motivation: describe the problem to be solved**
Real world use cases have large data sets that can not fit in memory. Doing performance estimation on such datasets is not possible with current i…
-
I am trying to do a `MultiClassifierDLApproach` to train a Multi-Label Multi-Class model but it seems to always end in an error when I try to use the GPU in the public Google CoLab environment…
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/ray-project/ray/issues) and found no similar feature requirement.
### Description
Ray dataset uses Arrow as data fo…