-
- udf 를 특정 object struct 에 명시하였더니 serializable 관련 에러 발생
- https://stackoverflow.com/questions/36794688/spark-task-not-serializable-for-udf-on-dataframe
-
![spark_sql_19 1_sportsrdd](https://user-images.githubusercontent.com/29932053/32846714-fb319eae-c9f5-11e7-827d-e4247cfdbbee.png)
-
previous results single server:
| | 100M | | | 10M | |
-- | -- | -- | -- | -- | -- | -- | --
trees | depth | time [s] | AUC | RAM [GB] | time [s] | AUC | RAM [GB]
1 | 1 | 1150 | 0.63…
-
In Apache Spark, found that pipeline tests are not present for all the data streams. Need to add pipeline tests for all the data streams.
-
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
### Search before asking
- [X] I have searched in the [issue…
-
Hi!
Why can't specifying the Pyspark version in environment variables be optional?
like this
```python
import os
import pyspark
os.environ['SPARK_VERSION'] = str(pyspark.__version__)
```
or…
-
I'm not reporting a bug, just looking for a workaround and I'm hoping someone can help!
I'm trying to call deequ's rowLevelResultsAsDataFrame function from pydeequ. Things work fine but as soon as …
-
**Describe the problem you faced**
Trying to follow the official Docker demo tutorial [here](https://hudi.apache.org/docs/docker_demo/), at step 2, I get an error executing a command inside one of …
-
We could consider writing a spark_fdw (foreign data wrapper) to enable querying data in Spark.
Or we could build a tight integration between Spark and PostgreSQL / Citus. In this scenario, Spark mana…
-