Open ChrnyaevEK opened 2 months ago
It seems like I may have misinterpreted observations. I continued to track production app and did some more testing and results point away from PyGWalker as I originally thought (potentially to Azure web app or our production code other issues). I will do local tests with memory profiler to see how it behaves overtime to rule out this observation as well.
I'm sorry for disturbance, I will continue debug with new evidences.
Health endpoint has been added to our production version and now we observe strange memory behaviour even without opening PyGWalker explorer (PyGWalker was still imported as package). Health opens empty Streamlit page every 5 mins and over last 24h a RAM usage was gradually growing (on image you can observe used memory getting closer to 500Mb without spikes with constant increase rate related to health calls).
RAM usage in production
I also tested sample app deployment on Azure to exclude Azure resource virtualization issues, but results did not confirm original hypothesis.
Sample app without PyGWalker on Azure
# app.py
import numpy as np
np.random.seed(seed=1)
import pandas as pd
import streamlit as st
def app():
# Create random dataframe
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list("ABCD"))
st.table(df)
app()
Sample app with PyGWalker was also deployed to Azure (it is running for few hours now). How ever it behaves as expected and release memory when objects are destroyed. Which makes me think, that the problem with our production version lays somewhere else.
Sample app with PyGWalker on Azure
import numpy as np
np.random.seed(seed=1)
import pandas as pd
from pygwalker.api.streamlit import StreamlitRenderer
def app():
df = pd.DataFrame(
np.random.randint(0, 1000, size=(100000, 4)), columns=list("ABCD")
)
render = StreamlitRenderer(df)
render.explorer()
app()
Hi @ChrnyaevEK , Thanks for your feedback.
Using pygwalker latest version, and try to cache StreamlitRenderer, it may avoid memory growth.
from pygwalker.api.streamlit import StreamlitRenderer
import pandas as pd
import streamlit as st
@st.cache_resource
def get_pyg_renderer() -> "StreamlitRenderer":
df = pd.read_csv("xxx")
return StreamlitRenderer(df)
renderer = get_pyg_renderer()
renderer.explorer()
There are several reasons why pygwalker memory grows:
StreamlitRenderer(df)
will parse the dataframe and infer the data type.render.explorer()
It will render the ui using html iframe(0.4.9.8 version has used the streamlit custom component to render pygwalker ui. The streamlit component has optimized this part of the memory overhead)In the next period of time, pygwalker will optimize the user experience of the streamlit component. Thank you again for your feedback.
Hi @longxiaofei ! Thanks for your attention.
I'm afraid that caching is not an option in this case, our data change with every request and thus cached function should look more like this:
@st.cache_resource
def get_pyg_renderer(key: str) -> "StreamlitRenderer":
df = pd.read_csv(key)
...
which basically is equivalent for no cache at all. ttl
and max_entries
will not help either.
I did however test this approach and I'm still facing the same strange behavior.
import numpy as np
import pandas as pd
import streamlit as st
from pygwalker.api.streamlit import StreamlitRenderer
@st.cache_resource(max_entries=3, ttl=20)
def get_render(key: int):
df = pd.DataFrame(
np.random.randint(0, 1000, size=(100000, 4)), columns=list("ABCD")
)
return StreamlitRenderer(df)
def app():
render = get_render(np.random.randint(1, 100))
render.explorer()
app()
Running this app locally (windows, as described in first massage with pygwalker 0.4.9.3 as this is our production version) results in constantly growing memory (it seems to occasionally release insignificant amount of memory, but it does not return to initial values). RAM used by python process with streamlit server with cached pygwalker render
I did also test few other code snippets locally to confirm that memory will eventually be released, but it seems like it's not.
import numpy as np
import pandas as pd
import streamlit as st
def app():
df = pd.DataFrame(
np.random.randint(0, 1000, size=(100000, 4)), columns=list("ABCD")
)
st.dataframe(df)
app()
streamlit server start (python -m streamlit run ...) - 12:25 (memory increase due to initial object initialization)
restart (R) - 12:27 (memory increased)
restart (R) - 12:28 (memory increased)
restart (R) - 12:29 (memory increased)
restart (R) - 12:30 (memory increased)
restart (R) - 12:31 (memory did not react)
page close - 12:32 (memory decreased, but not to initial level)
stop - 12:58 (before stop a few slight memory decreases were observed without any external trigger)
Total test time: ~30min
See attached PDF debug.pdf
import numpy as np
import pandas as pd
from pygwalker.api.streamlit import StreamlitRenderer
def app():
df = pd.DataFrame(
np.random.randint(0, 1000, size=(100000, 4)), columns=list("ABCD")
)
render = StreamlitRenderer(df)
render.explorer()
app()
start - 13:09
restart - 13:11 (significant memory increase)
restart - 13:12 (memory increase)
restart - 13:13 (memory increase)
restart - 13:14 (memory increase)
restart - 13:15 (memory increase)
page close - 13:16 (memory decrease, not to initial values)
stop - 13:40 (no memory decrease observed)
See attached PDF debug.pdf, same as above
Apps with and without PyGWalker both hold memory. PyGWalker allocate memory on every rerun, bare Streamlit seems to eventually saturate (may not allocate noticeable amount of memory).
There is no issue openning multiple Streamlit apps without PyGWalker, but as soon as PyGWalker is used we run out of memory (even with cache). This seems to be confirmed locally and on Azure.
I still suspect some issue with PyGWalker on Streamlit (may be PyGWalker just misuse Streamlit caching mechanisms), can you please check steady memory growth when running minimal PyGWalker app locally?
Thanks!
Describe the bug I observe RAM growth when using PyGWalker with Streamlit framework. The RAM usage constantly grow on page reload (on every app run). When using Streamlit without PyGWalker, RAM usage remain constant (flat, does not grow). It seems like memory is never released, this was observed indirectly (we tracked growth locally, see reproduction below, but we also observe same issue in Azure web app and RAM usage never decline).
To Reproduce We tracked down the issue with isolated Streamlit app with PyGwalker and memory profile (run with
python -m streamlit run app.py
):Observed output for a few consequent reloads from browser (press
R
, rerun):Expected behavior RAM usage to remain at constant level between app reruns.
Screenshots On screenshot you may observe a user activity peaks (cause CPU usage) and growing RAM usage (memory set).
On this screenshot a debug app memory profiling is displayed.
Versions streamlit 1.38.0 pygwalker 0.4.9.3 memory_profiler (latest) python 3.9.10 browser: chrome 128.0.6613.138 (Official Build) (64-bit) Tested locally on Windows 11
Thanks for support!