444B / streamlit-analytics2

👀 Track & visualize user interactions with your streamlit app
MIT License
14 stars 2 forks source link

Tracking all select/multiselect options can consume too much memory #17

Closed 444B closed 4 months ago

444B commented 4 months ago

Original issue by mateusccoelho on 2023-09-02 14:10:28+00:00

Currently this package tracks all select/multiselect options, even if they are not selected. This maybe is a good feature because the option list can change during the app utilization. Still, we can't track which option were available to each user.

On the other hand, this can be problematic when using more than, say, 3000 options. Or when the label changes in execution time. The visualization in the analytics section becomes too large and this has the potential of using a lot of memory, since counts are stored in session state. That is, it's necessary to allocate a dict with 3000 string keys.

Example:

import streamlit as st
import streamlit_analytics as st_analytics

with st_analytics.track():
    val = st.multiselect('abc', list(range(5000)))

One thing that's supposed to be desirable with this package is to interfere as little as possible with the server's resources or with the user's code. So I propose to create a parameter in start_tracking so the user can choose to track all options. But it should be False by default, in which case _wrap_select and _wrap_multiselect should behave like _wrap_value.

444B commented 4 months ago

@mateusccoelho is this still relevant?

444B commented 4 months ago

I agree with you on the need to take into account performance. A party to that consideration will also be the user, who is choosing to pass in 3000 multiselect options and should expect a drop in performance if doing so. Out of curiosity, I would like to replicate this and test the performance hit.

We can address this as an edge case for now but also a true consideration when talking about scaled use. Marking this as a planned long term fix when we refactor the project to include project settings - in which case we will verify with the user to include all keys or just the ones used