Closed rishabmps closed 1 year ago
Hi, How size(row count, column count, data szie) is your data?
pygwalker will support bigger datas(<= your computer memory size) in next version.
1M rows, approximately 200 columns (mix of float and categorical values). The ideal user experience would be able to use pyspark df directly in the walk function.
Pygwalker will support pyspark within the next 4 versions.
version 0.3.3 already support pyspark.dataframe
, but calculation process of spark not suitable to get datas for render charts.
If the dataframe(spark) you end up needing to analyze doesn't exceed three times your machine's RAM, you can convert to pandas.dataframe
: df = spark_df.toPandas()
,
then use duckdb to calculation datas in pygwlaker: pyg.walk(df, use_kernel_calc=True)
Pygwalker will support pyspark within the next 4 versions. @longxiaofei Is this still in plan?
@rishabh-dream11 Dataframe of pyspark is currently supported, but pyspark is not suitable for this kind of interactive calculation.
Native Support for rendering visualizations for PySpark data frame in the Jupyter notebook. It is OK to introduce some constraints if the sheer size of the data frame makes it difficult to load.