-
Collecting data into R from Spark causes two separate calculations of the query. One is the execution of the query itself and the other is a count. Consider the following simple example
```r
libra…
-
Currently, the API to connect to a backend, with a backend specific option, is as follows:
```python
import ibis
ibis.options.impala.temp_db = 'foo'
conn = ibis.impala.connect(host='impala',…
-
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
I've noticed from various discussions that Ballista is adding some (considerable?) overhead…
-
Hello - using EMR (hudi 0.5, spark 2.4.4) and during upsert i'm running into the below error:
There were similar issues posted before, but not specific to ParquetDecodingException. I'm able to read…
-
- Should be iterable
- Number of columns and types should match what's expected
- Allow pre_aggregated on execute_df; error or warning if pandas passed in (or maybe just work?)
Spark DataFrames and R…
-
First, thank you for this great package. It has already helped me communicate my results using Julia. I have no doubt this package is improving scientific communication around the world.
In my view…
-
I'm using spark-notebook (https://github.com/spark-notebook/spark-notebook) with spark/scala for 4 years now but looking for alternatives now. polynote looks great and could possibly outperform spark-…
-
**Describe the bug**
When a Checkpoint makes use of a RuntimeBatchRequest that references an un-pickleable object, such a Spark Dataframe, the run will fail with an error such as "TypeError: can't pi…
-
**Is your feature request related to a problem? Please describe.**
I very often need to look at dataframes that are wider than my monitor. Spark dataframes wrap row by row, but Pandas render out pret…
-
Hi, more than an issue, this is a question: how two Pool objects can be compared? I need this for a unit test.
I've seen [here](https://catboost.ai/docs/concepts/python-reference_pool.html#python-r…