Closed ysomawar closed 2 years ago
I think it should take advantage of Spark since it directly leverages Spark join function internally especially here:
Btw, I recommend you to use pyspark.pandas
module in PySpark, since Koalas is ported into PySpark.
Thank You @itholic for your recommendation. We tried same thing on pyspark.pandas it works as expected.
Hello,
I am very new to Koalas just started with understanding. I am planning to use koalas with spark for large data processing. I am trying to merge two large dataset by using koalas merge functionality, but observed that merging is not performing on spark, it is executing on local and resulting into slow performance same as pandas.
following is code block,
On merge, non of the spark task got created, it is merging the frames locally not taking advantage of spark. spark version: 3.1.1
Could somebody please assist me how I can take advantage of spark for merging the frame (While using any koalas API)
Thanks in Advance.
Regards, Yogesh