deephaven / streamlit-deephaven

Deephaven Streamlit custom component
MIT License
1 stars 3 forks source link

Intermittent failure to add items to server #16

Open jnumainville opened 3 months ago

jnumainville commented 3 months ago

Sometimes, when the streamlit tab opens after running streamlit run streamlit_deephaven/test_app.py an error appears in streamlit because the object is not found. It's not appearing in the panels list either. It seems to happen more if I have lots of tabs open but I am not 100% sure as it happens consistently but I don't have a deterministic recreation. image image

So, something is going wrong such that a table added to main.dict is not grabbed by the server.

kzk2000 commented 3 months ago

@jnumainville this looks familiar and I could never figure it out what exactly is causing this.

for that reason, I ALWAYS pass widget and object_id here: https://github.com/kzk2000/deephaven-streamlit/blob/main/app/src/streamlit_deephaven.py#L83

This works thanks to DH execution context

  1. Each streamlit tab/re-run becomes a new thread, you can confirm this via logging the thread_id such as here: https://github.com/kzk2000/deephaven-streamlit/blob/main/app/src/streamlit_deephaven.py#L107

  2. using a static variable name DH server side actually works thanks to DH's execution context. Each Streamlit tab and/or re-run becomes it's own thread, and gets it's own execution context. so multiple users can assign the same DH variable and yet are separated from each other. That said, not sure if that creates a leak somewhere on DH server side, happy to learn more on that front. In my experience, it seems to be working fine though

kzk2000 commented 3 months ago

btw, Streamlit's st.cache_resource isn't meant to be thread-safe, from their docs:

st.cache_resource Decorator to cache functions that return global resources (e.g. database connections, ML models).

Cached objects are shared across all users, sessions, and reruns. They must be thread-safe because they can be accessed from multiple threads concurrently. If thread safety is an issue, consider using st.session_state to store resources per session instead.

just saw there is no with lock: block around _start_server here: https://github.com/deephaven/streamlit-deephaven/blob/main/streamlit_deephaven/streamlit_deephaven.py#L66

So my hunch is that one of your many tabs/threads creates the server, but another tab/thread creates the execution context that's stored on the DH instance as attribute, and that's causing a mismatch somehow.

@jnumainville So, maybe this will fix it:

   # server = _starting_server(host, port, jvm_args)  # move this line into the `with lock:` block

    with lock:
        server = _starting_server(host, port, jvm_args)  # put it here
        if __main__.__dict__.get('MAIN_AND_DH_DRIFTED', True):
            __main__.__dict__['MAIN_AND_DH_DRIFTED'] = False
            Server.instance.__globals = __main__.__dict__
jnumainville commented 3 months ago

Thanks for the suggestion! Unfortunately did not work As to thread safety - my reading was the cached object needs to be thread safe, not the function itself. As far as I can tell, the function is only running once.

kzk2000 commented 3 months ago

@jnumainville I was testing a bit more and noticed some odd behavior

Case A: I have 5 tabs open, I restart the server, and I "touch" any of the 5 tabs right away, and make a Streamlit selection that triggers a re-run.
Result: DH and Streamlit seem to stay in sync and the binding work well.

Case B: I have 5 tabs open, I restart the server, and I do NOT "touch" any of the 5 tabs until they automatically reloaded. Result: DH and Streamlit seem to get out of sync, some binding became the default and no matter which re-runs I trigger (choosing between 1,2, or 3 sceonds for bottom plot), it doesn't work

it's really bizarre...if you can figure it out, I'm very curious. it must be some weird race condition somewhere