-
Hi All,
I am new to Spark and Scala. I have the source code for Spark SQL Performance Tests and dsdgen .
Can anyone tell me how to proceed next ? I am done with building by giving command bin/run…
-
We had a use case at Argenta, where we worked with table of about 300 cols and ~2 mil. of rows.
There, the preprocessing took a lot of time and memory especially.
What we’d need is to find any dat…
-
**Describe the bug**
Using the automatic sorting type in sort command results in a significant increase of query time. The culprit seems to be the `numericalStringCheck()` function. The function sh…
-
### Is your feature request related to a problem or challenge?
If we want to make DataFusion the engine of choice for fast OLAP processing, eventually we will need to make joins faster. In addition t…
alamb updated
2 months ago
-
Snowflake has some capabilities when it comes to [transforming during a load](https://docs.snowflake.net/manuals/user-guide/data-load-transform.html). From my very basic understanding of what the [tra…
-
## Willingness to contribute
The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (ei…
-
Hi,
Thanks a lot for developing and maintaining this super useful library!
I was wondering about reading ["row groups"](https://github.com/apache/parquet-format?tab=readme-ov-file#glossary); is…
-
PySpark (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.col.html) and Polars (https://pola-rs.github.io/polars/py-polars/html/reference/expressions/col…
phofl updated
11 months ago
-
### SynapseML version
0.11.2
### System information
- **Language version** python 3.9
- **Spark Version** 3.4.1
- **Spark Platform** AWS EMR
### Describe the problem
LightGBM consi…
-
SynapseMl:0.9.2
spark:3.1.2
I use SynapseMl with spark 3.1.2 on yarn..
the dataset is like this:
0120030913371513,1987,40,694,1,2,10,6,32,0.12,0.6,2,2,1,1,5,450,53,659,4,0.6,0.7,0.93,0.8,4,1…