Kanaries / pygwalker

PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis
https://kanaries.net/pygwalker
Apache License 2.0
10.7k stars 545 forks source link

Pygwalker cannot render too much data #546

Closed heqi201255 closed 1 month ago

heqi201255 commented 1 month ago

I was trying to plot my data using Pygwalker, the data is a csv file about 467MB with shape (3682080, 12), my code is like:

from pygwalker.api.streamlit import StreamlitRenderer
import pandas as pd
import streamlit as st

# Adjust the width of the Streamlit page
st.set_page_config(
    page_title="Use Pygwalker In Streamlit",
    layout="wide"
)

# Add Title
st.title("Use Pygwalker In Streamlit")

# You should cache your pygwalker renderer, if you don't want your memory to explode
@st.cache_resource
def get_pyg_renderer() -> "StreamlitRenderer":
    df = pd.read_csv("/data.csv")
    # If you want to use feature of saving chart config, set `spec_io_mode="rw"`
    return StreamlitRenderer(df, kernel_computation=True)

renderer = get_pyg_renderer()

renderer.explorer()

I tried to use pygwalker inside jupyter and via streamlit, both gave me the error "The query returned too many data entries, making it difficult for the frontend to render. Please adjust your chart configuration and try again."

Screenshot: Screenshot 2024-05-13 at 14 56 56

The visualization is stuck at loading, and got a timeout message afterwards. Is there any workaround to render my data? What chart configuration should I adjust?

longxiaofei commented 1 month ago

Hi @heqi201255

Thank you for bringing up this issue with pygwalker. By default, pygwalker has a fixed limitation on data queries to ensure the safety of memory usage in the frontend browser.

When the count(distinct t) exceeds 1,000,000 (1 million), it becomes challenging for the frontend to efficiently render such a large amount of data into a chart.

To address this issue, we are considering adding a new parameter that allows users to control the maximum data size for rendering. This parameter will provide flexibility and allow users to adjust the size according to their specific needs.

One possible solution is to introduce the following code snippet, which sets the maximum data length to 10,000,000 (10 million):

pyg.GlobalVarManager.set_max_data_length(10 * 1000 * 1000)

We would appreciate your thoughts and feedback on this proposed solution. Please let us know if you have any suggestions or concerns.