-
When using glow.py in a Conda environment with the following setup:
- Python version: 3.10
- PySpark version: 3.5.1
- Glow.py version: 2.0.0 (installed via pip install glow.py)
Attempting to loa…
-
### What happened?
Discovered in https://github.com/NickCrews/mismo/issues/64. CC @jstammers. Here is a more minimal reproducer.
Run with `uv run script.py` to get uv to install the deps automa…
-
Wondering if we could make use of the [persist ](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.persist.html)or cache methods in pyspark to load the da…
-
The current implementation divides the dataset into partitions by assigning a partition ID computed using pyspark function `spark_partition_id` to each row, followed by querying each partition. I thin…
-
When I try to use the `pyspark` package, I get this error:
> Backend 'setuptools.build_meta:__legacy__' is not available.
Here is the `pyproject.toml`:
```yaml
[tool.poetry]
name = "cowapp"…
njlr updated
2 months ago
-
this is my docker-compose file:
```
version:` '3.8'
services:
spark-master:
image: bitnami/spark
container_name: spark-master
environment:
- SPARK_MODE=master
- S…
-
Apache Spark is widely used in the python ecosystem for distributed computing. As user of spark I would like for ruff to lint problematic behaviours. The automation that ruff offers is especially usef…
-
### We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?
I'm building a pl…
-
### Missing functionality
After databricks runtime 14, the dataframe type is changed in notebook. It was `pyspark.sql.dataframe.DataFrame`, but now it is `pyspark.sql.connect.dataframe.DataFrame`
…
-
Since this project uses pyspark to read parquet files, pyspark and it's dependencies spark and hadoop are required but the documentation is currently lacking a guideline of how to run the script on Wi…