emilhe / dash-extensions

The dash-extensions package is a collection of utility functions, syntax extensions, and Dash components that aim to improve the Dash development experience
https://www.dash-extensions.com/
MIT License
409 stars 58 forks source link

Server sometimes sets up incorrectly on dash version >=2.7.0, dash-extensions version >=0.1.7 #241

Open tlauli opened 1 year ago

tlauli commented 1 year ago

Hello, I observed a inconsistent behavior during server setup. The problems were encountered only while running in a docker image in cluster, so debugging options were limited. Also, since it is a company code under NDA, I am unable to provide a MWE :(. If some more info is needed, I will provide it if possible.

Three different behaviors were observed: 1: During the server setup, callbacks were loaded multiple times, according to callback graph (two and three times was observed). Dash error console reported an overlapping callback error for every output in the application. Callbacks belonging to AIO components were loaded correctly. 2: Callback were loaded correctly according to callback graph, but there was no layout under the div corresponding to dash.pages_container. 3: The app loaded correctly.

All three cases were observed after restarting the same docker container multiple times.

My investigation of possible cause: Different behavior on different runs of the same container made me think that it might be a race condition. After checking the changes in dash 2.7.0 server setup, I found this MR https://github.com/plotly/dash/pull/2265/files , in which the call to Dash._setup_server is moved from Dash.server.before_first_request(..) (before_first_request was deprecated by Flask) to Dash.server.before_request(..) and manual checking if the request is first was implemented. My idea was that the code enters _setup_server() multiple times and undefined behavior occurs.

This lead me to attempt this fix.

class DashProxyPatched(DashProxy):
    setup_server_lock = threading.Lock()

    def _setup_server(self):
        with self.setup_server_lock:
            logger.info(f"""Got first request to setup server is : {bool(self._got_first_request["setup_server"])}""")
            logger.info(f"Current thread is : {threading.current_thread().ident}")
            logger.info("Setting up server")
            super()._setup_server()
            logger.info("Server set up")

This indeed helped, and I observed multiple different thread ids in the logs.

The app is making heavy use of dash-extensions, so this is why I am reporting the problem here instead of directly to dash. Second reason is that in the case that I identified the cause correctly, both dash and dash extensions will need changes. If that is the case, I will gladly submit the report to dash.

emilhe commented 1 year ago

I have also noticed similar behavior. Accordingly to my preliminary tests, it goes away by using a single thread, i.e. if you use a command like

gunicorn --threads=1

for spinning up the app. I am also convinced that it is a regression introduced by the change in callback registration, but I haven't had time to look deeper into a proper solution yet. With your proposed solution, did you see the error at all?

Have you been able to reproduce the issue in "pure" Dash (which would indicate that the regression needs to be fixed here rather than only in dash-extensions)?

tlauli commented 1 year ago

I did not try to reproduce it in pure dash. And I am not sure I will have enough time to try anytime soon.

Rest of the post assumes that the race condition is indeed the problem. Both dash and dash extensions need fixes I think.

In dash the first three lines of the _setup_server function

    def _setup_server(self):
        if self._got_first_request["setup_server"]:
            return
        self._got_first_request["setup_server"] = True
        ...

should be an atomic operation. Dash-extensions similarly need a way to ensure that only one thread is running the code between the first and the last line of this part of the _setup_server function

    def _setup_server(self):
        first_request = not bool(self._got_first_request["setup_server"])
        if first_request:
            self.register_callbacks()
        # Proceed as normally.
        super()._setup_server()
        ...

I also expect that hitting the race condition in pure dash will be almost impossible, since the check and set of the variable are on consecutive lines, requiring one thread to stop executing right after the check.

emilhe commented 1 year ago

Since DashProxy._setup_server is calling Dash._setup_server, I guess implementing the proposed fix in DashProxy will also solve the issue for Dash itself (if you are using DashProxy)? I have created a PR,

https://github.com/thedirtyfew/dash-extensions/pull/244/files

that I'll merge after doing a bit more testing. Unfortunately, I haven't been able to reproduce the race condition in a unit test, so that makes it a bit more tedius to confirm the validity of the fix.

emilhe commented 1 year ago

I have pushed a 0.1.12rc1 version that includes the fix if anyone wants to try it out.

tlauli commented 1 year ago

0.1.12rc1 version seems to be working fine for me.