Mohamed-512 / Extra-Streamlit-Components

An all in one place, to find complex or just not available components by default on streamlit.
Apache License 2.0
466 stars 59 forks source link

Cookie manager per user vs global #54

Open tonkolviktor opened 11 months ago

tonkolviktor commented 11 months ago

The default example uses: https://docs.streamlit.io/library/api-reference/performance/st.cache_resource Which states: "Cached objects are shared across all users, sessions, and reruns." cookie_manager has a variable called cookies, which will becase global and will be sahred accross users, right?

If that's the case on the server side we have a mix of some random combination of cookies, which overwrite each other.

That means if one has more than 1 user on the streamlit page cookie manager is not usable.

I've started to dig deep because this is exactly what we are experiencing, we overwrite each others session_ids.

Could you please confirm or deny this hypothesis?

ZupoLlask commented 11 months ago

Honestly, I think that's a typo in the example code. If you change that decorator to @st.cache_data I think you're good to go.

Thanks for raising that issue, it's useful for others.

tonkolviktor commented 11 months ago

thanks for the reply @ZupoLlask, well, as far as I understand cache_data is still global, although on that page streamlit does not state it explicitly it only says https://docs.streamlit.io/library/api-reference/performance/st.cache_data "Each caller of the cached function gets its own copy of the cached data."

So eg.: if you do this:

@st.cache_data(experimental_allow_widgets=True)
def get_time():
   return time.time()

and open the page in 2 browsers you'll see the same result.

one solution could be to not use caching at all, but I'm unsure what the implications of that are.

ZupoLlask commented 11 months ago

Hi there!

Since my reply, I continued to investigate the gripes behind the problems with my use case and my understanding evolved a bit.

Streamlit has a difficult execution model, specially if your use case require statefulness across browser sessions (more than a single st.session_state per user, if you want to make it persistent to page reloads) and it gets even harder if you want to make it work with simultaneous unique users (each with potentially multiple browser sessions).

Eventually, yesterday I've been able to make the all thing work reliably with the additional complexity of having integrated Google Oauth2 authentication.

However, in the end, I felt a bit insecure because everything seems too convoluted (to get some common end result working, at least when you use any other framework but Streamlit) and for that reason I'm taking some time to get my head wrapped around all the concepts and misconcepts.

As I need some time for this, I'll get back to you as soon as I'm ready to share more solid conclusions.

CHerSun commented 11 months ago

@ZupoLlask could elaborate with a little more specifics please? Looks valuable to me.

As of caching - my understanding was:

And as streamlit doesn't really know what user is (no auth) - it has no mechanics related to that. Normally I'd stick to Redis or something like that for user-specific caching (different tabs, browsers), or maybe a st.cache_resource with one of arguments being unique user id.

How far is that from your testing results? I'm building quite large multi-user app and those questions are important to me, though I had no time to test properly yet.

tonkolviktor commented 11 months ago

Hi, so yeah most of us come to this extention exactly for user management.

Both of this cannot be true: "single user session" vs "And as streamlit doesn't really know what user is"

cache_data still stores the information globally, see my previous example: https://github.com/Mohamed-512/Extra-Streamlit-Components/issues/54#issuecomment-1763480097

"Normally I'd stick to Redis or something like that for user-specific caching (different tabs, browsers), or maybe a st.cache_resource with one of arguments being unique user id." Yes of course, but that's not the question, question is setting and storing the user_id or better session_id in a cookie.

I've ended up using streamlit-cookies-manager, please note: https://github.com/ktosiek/streamlit-cookies-manager/issues/5

Problem with that is that it's not maintained, but it works (after the patch), this cookie manager does not work as per documentation. It might work if you do not add any caching at all. However since I have now a working solution, I was not interested to fully test that approach :)

ZupoLlask commented 11 months ago

Regarding @tonkolviktor code snippet, please check the toy example I created to illustrate the "correct" usefulness of st.cache_data and st.cache_resource:

import streamlit as st
from datetime import datetime

@st.cache_data
def get_now(i):
   return datetime.now()

@st.cache_resource
def server_cache():
    obj = {}
    return obj

cache_keys = ["st_runs", "count", "uid"]
cache = server_cache()

if not cache:
    for key in cache_keys:
        cache[key] = cache.get(key, 0)

def main():
    cache["st_runs"] += 1
    print("st_runs", cache["st_runs"])

    if "uid" not in st.session_state:
        cache["uid"] += 1
        st.session_state["uid"] = cache["uid"]

    st.write(f"uid {st.session_state['uid']}")

    st.button("rerun (no-op)")
    if st.button("increment (count)"):
        cache["count"] += 1
    if st.button("decrement (count)"):
        cache["count"] -= 1

    st.write(f"count {cache['count']} @ datetime {get_now(cache['count'])}")

    if cache.get("count") == 0:
        print("count", cache.get("count"))

if __name__ == "__main__":
    main()

A few notes on this toy code:

Regarding the specific issue with your snippet, or why my example works and yours don't:

As it's stated in the Streamlit docs, we must be very careful with st.cache_data and st.cache_resource as both are indeed shared globally, but both are useful for different things like making our apps persistent to browser reloads (which trigger a new Streamlit session) if we pair them with a UUID (only known by our app) store in a cookie and some OAuth2 authentication service (don't store anything from OAuth2 response in a cookie).

On my tests (using CookieManager from Extra-Streamlit-Components package), most of my issues were related with racing conditions of some widgets depending on a rerun now while the the others requiring a rerun later (not now).

Although @Mohamed-512 did a wonderful service developing this kind of early feature more than 2 years ago, CookieManager may work for you depending on your code and on other widgets being used code, or if you understand how to get around all those issues and rerun racing conditions.

Although my example is now working reliably using streamlit-auth and extra_streamlit_components, I feel that both packages logic is too fragile and troublesome for my use case and I'm going to replace CookieManager dependency with https://github.com/blipk/StreamlitExtras/tree/main/streamlitextras/cookiemanager as this one's logic some far more robust. Probably I'll try to extend its Authenticator class to directly use GoogleOAuth2 without depending on Firebase... But I'm not sure I'll want to do that at this stage.

Some of the issues we stumbled upon can be worked around by using Redis or some database, as stated by @CHerSun. In my case, I insisted in trying not to use that kind of resource for learning purposes as I started to fear that Streamlit gripes could probably be worse than I anticipated and wanted to dig dipper to find out if I could get it and live with them.

tonkolviktor commented 11 months ago

thanks for the details answer @ZupoLlask

Regarding the specific issue with your snippet, or why my example works and yours don't:

I did not really understand what do you mean by "works". The only thing I wanted to show is what you stated as well: "st.cache_data and st.cache_resource as both are indeed shared globally"

My intention was not to create a user specific cache_data, specially not without a cookie which is the key to all this.

Thanks for streamlitextras (I was for a while really confused because it's so similar to Extra-Streamlit-Components) I agree with you I think that's the way forward regarding the cookie manager. It's a really nice catch, since nowhere in the examples the cookiemanager is explicitly mentioned. Thanks!!!

(For auth which is not really the scope of this ticket :) but since we all came here because we want to do auth, let me quickly share this repo: https://github.com/sfc-gh-bhess/st_oauth which in itself was not really useful, but it shows how an oauth flow could work. TODOs to make it work :)

With all that being said it would be probably nice if the cookie manager of this repo would be fixed as well.

CHerSun commented 11 months ago

thanks for the reply @ZupoLlask, well, as far as I understand cache_data is still global, although on that page streamlit does not state it explicitly it only says https://docs.streamlit.io/library/api-reference/performance/st.cache_data "Each caller of the cached function gets its own copy of the cached data."

So eg.: if you do this:

@st.cache_data(experimental_allow_widgets=True)
def get_time():
   return time.time()

and open the page in 2 browsers you'll see the same result.

one solution could be to not use caching at all, but I'm unsure what the implications of that are.

tested this. Ouch. Thank you for clarifying this.

informatica92 commented 1 week ago

Hi all, I also ended up here to create some sort of light "user" functionality to my app. My current situation is:

I mentioned this issue about the additional rerun into the #58