-
Hi,
this is the error I get when I run `clusters = linker.cluster_pairwise_predictions_at_threshold(df_predict, threshold_match_probability=0.95)`:
```
`----------------------------------------…
-
I wish I could join a large cuDF with a small series/list/sequence in terms of full join in sql, or even better with the small series/list being broadcast for the full join like in spark sql, while th…
-
When working on aggregation filters, I had an example,
```
testAggregateFilterOneCount : List Antique -> List { product : Product, vintage : Float, all : Float }
testAggregateFilterOneCount antique…
-
### Discussed in https://github.com/delta-io/delta-rs/discussions/599
Originally posted by **ganesh-gawande** May 9, 2022
Hi,
I am using the documentation - https://github.com/delta-io/de…
-
**Describe the problem you faced**
**Scenario #1:**
1)created a dataframe(**targetDf**) and using the below statement to write it in GCS Bucket location (for ex - **locA**)
targetDF.write.forma…
-
UPDATE: closed in favor of https://github.com/dbt-labs/dbt-redshift/issues/204
### Is this your first time submitting a feature request?
- [X] I have read the [expectations for open source contr…
-
### Proposed change
I come across this problem with pyspark.
When I call foo.show(),if the foo dataframe contains too many columns, the result won't be printed in a single row in jupyter noteb…
FukoH updated
2 years ago
-
It would be nice if there will be a command to connect to an existing livy session.
For example connecting to livy session with `id=4` and `kind=pyspark` and naming to `pyspark-test`
`%spark connect …
-
Issue writing to AWS S3 via the aws-java-sdk in spark context
## Describe the bug
For a given DataFrame df in a PySpark env, the operation `df.write.parquet("s3a://some-bucket/test.parquet")` star…
-
Hi
Thanks for this awesome lib!
Hey, looking for some guidance on an issue I'm having
I'm trying to compare two dataframes for equality. It's not a requirement to know what's different jus…