-
Hello,
Thanks for developing the `pbspark` library. This seems quite useful for converting protobuf on the wire to dataframes.
I have some recursive proto definitions of the form (a simplified exa…
-
Datasets and DataFrames are more efficient and should be preferred over direct RDD programming as of Spark 2.0. Build out the sparkplug-sql project to support them.
-
### Bug/Feature Request Description
In notebooks such as here: https://github.com/Featuretools/predict-next-purchase/blob/master/Tutorial.ipynb and documentation: https://docs.featuretools.com/usag…
-
With the [new approach](https://github.com/GoogleCloudPlatform/openmrs-fhir-analytics/pull/178) to our query library API, the underlying runner for distributed data processing is completely separated …
-
## Description
Suggested by a first time user:
> I imagine people using this would come from pandas or R : what would have been useful is to see a side by side "this is how you do it in pandas" vs…
-
It would be interesting to be able to read spss files with Polars. Pyreadstat provides support to convert to Pandas. Could this be done with Polars? Polars is a great package and would help a lot spee…
-
The spark session created by pytest-spark is not so optimized for small unit tests that only work with small dataframes.
pytest-spark seems to rely on whatever are Spark's default settings:
http…
-
Please find below all the out of the Exercise. The step 10 is successful, the step 11 fails.
Also fails the "Loading and Inspecting Parquet Files" find the output below.
=================
Exerci…
-
- There is a bug with PySpark and pandas 1.4.X
- We need to investigate what the bug is, as this prevent us from upgrading to the pyarrow dtypes
- We should check if this bug occurs with latest pand…
-
Spark version _:
![image](https://user-images.githubusercontent.com/22921775/171391938-cc85b808-7bdd-4bf6-b858-cab5873bd130.png)
`case class RawData(
productName: String,
totalNumber: String…