Closed amueller closed 3 years ago
Thanks for the report, @amueller .
As you mentioned in the description, Koalas doesn't allow creating the Series with the Koalas Index.
When creating the Koalas Series, the pandas DataFrame is needed for creating the InternalFrame.
So, if Koalas want to allow creating Series with the Koalas Index, we should use to_pandas()
internally which is dangerous since it move the all distributed data into a single node. (Yes, just like you did in the your code explicitly)
We recommend to use to_pandas()
explicitly like you did in your code for now, when only you're sure that your data size is small enough.
You can check the more detail about the Koalas internal in the Koalas internal.
Oh, anyway, Koalas will be ported into PySpark since Spark 3.2, so this repository now only in maintenance mode.
I'd recommend to use pandas
module in PySpark after Spark 3.2 release.
You can find the more details in SPIP: Support pandas API layer on PySpark!
Thanks for the explanation! It would be great to allow using koalas indexes. I don't see how to do it now if the index is large. Anyway closing here if the repository is in maintenance mode.
this is Koalas 1.8.0 and pandas 1.2.4
works.
Thanks :)