-
Spark dataframes should default to writing to the `file://` file-system rather than the `hdfs://` file-system.
We also need a PASS/FAIL test notebook that checks this is working correctly.
-
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
I'd like to deploy on GCP Dataflow, Apa…
-
Hi there,
While there is a nice way to save an avro schema in a parquet file when working with RDD's, I've been unable to find something similar for DataFrames. Are there any plans to add this feature…
-
Many ML workloads such as LogisticRegression generate and require as input datasets of the form RDD[LabeledPoint]. Converting back and forth from a weakly typed dataframe to an RDD of LabeledPoint is …
-
I see dozens of issues and enhancement suggestions for DataFrame in Microsoft.Data.Analysis namespace untouched for almost a year.
Are there any resources allocated to address those?
Is the project …
-
I recently discovered modin and loved the clean approach to working with large dataframes in a simple manner. One of the things that struck me was that the [Modin architecture](https://modin.readthedo…
-
### Missing functionality
Polars integration ? https://www.pola.rs/
### Proposed feature
Use polars dataframe as a compute backend.
Or let the user give a polars dataframe to the ProfileReport.
…
-
# Description
Implement the ability to use positional/keyword args with `sql`. Because of the differences between python and rust, the function arguments need to be clearly implemented.
The pys…
-
VCF, BGEN and Flink are common file formats in Genomics. The open source project __[Glow](https://glow.readthedocs.io/en/latest/introduction.html)__ adds support for datasets with these formats into S…
-
Could anyone help me with the optimal configurations for the connector while inserting documents from dataframes of sizes ranging from 100000 to 1000000. The spark cluster can autoscale to 20 worker …