-
**Describe the bug**
1. code block in pipeline_model.fit(), No progress, spark stage always 0
2. csv data : column 200, row 800000
3. one centos compute train cost time: 1min
**To Reproduce**…
-
-
- [x] Add native parquet writer #2004
- [x] Parquet files written by presto-parquet can't be read by parquet-hadoop library used in Spark #6377
- [x] Native Parquet Writer writes Parquet V2 files th…
-
I'm seeing jvm crashes in our spark cluster which I believe are being caused by `LGBM_DatasetCreateFromCSRSpark`
https://github.com/microsoft/LightGBM/issues/2360 indicated some issues in that met…
-
## Problem Description
This design proposal is for adding feature request #229.
Currently, Hyperspace supports creating indexes only on data with fixed schema. This means:
- All columns from "…
-
## Willingness to contribute
The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (ei…
-
Hi bartag and contrib,
First off, thank you so much for making py4j! I use pyspark as part of my job and it's a lifesaver in terms of code reuse with the rest of our products and with building on t…
-
### SynapseML version
0.10.0
### System information
- **Language version** (e.g. python 3.8, scala 2.12): python3
- **Spark Version** (e.g. 3.2.2): 3.0
- **Spark Platform** (e.g. Synapse, Databri…
-
I wish we can support data type org.apache.spark.mllib.linalg.VectorUDT.
Mini repro:
```
import org.apache.spark.sql.types._
val rows = spark.sparkContext.parallelize(
List(
Row(0.0, 1…
-
_Original author: tony.hi...@gmail.com (August 02, 2012 18:46:43)_
Removing rows that are duplicate in one or more columns is clunky workaround.
My intuition when I went looking for a dedupe opt…