Closed robertbindar closed 3 years ago
I have toyed with "out of band" sharing of states between notebooks and it works but not very elegant. Pickling to files works but is probably prone to race condition. Slightly more elegant is to use shared memory but there is a python bug/feature that makes that unfeasible (shared mem gets unlinked when creation process dies).
The most elegant would be if jupyterhub had som msg queue or similar we could piggyback on.
Asking the server how many clients are connected sounds like a good solution (if doable).
Hämta Outlook för Androidhttps://aka.ms/AAb9ysg
From: Robert Bindar @.> Sent: Wednesday, May 5, 2021 5:24:16 PM To: MariaDB/mariadb_kernel @.> Cc: Jonas Karlsson @.>; Mention @.> Subject: [MariaDB/mariadb_kernel] If multiple notebooks are opened, only the last one should attempt to kill the MariaDB server (#24)
As pointed by @jonakarlhttps://github.com/jonakarl in #17https://github.com/MariaDB/mariadb_kernel/issues/17, there is a corner case we didn't think of when multiple MariaDB notebooks are started by the same Jupyter user and one of them also was responsible for spinning a MariaDB server (start_server=True).
Case: a student opens 2 notebooks. The first one starts the server, the second one connects to the already spinning server. But the student closes the first notebook, which kills the server. Thus the second notebook remains hanging with no server to talk to.
We most probably don't have a way to share state between kernel instances from multiple notebooks (should be researched), but an idea would be to store such state within the server itself.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/MariaDB/mariadb_kernel/issues/24, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACFVVF5O7PQPYTQL3557OGTTMFPKBANCNFSM44FFPPUQ.
När du skickar e-post till Karlstads universitet behandlar vi dina personuppgifterhttps://www.kau.se/gdpr. When you send an e-mail to Karlstad University, we will process your personal datahttps://www.kau.se/en/gdpr.
Sounds like you already explored most of the possibilities available, thanks for the info. I will do some research if there is some kernels mechanism to achieve something like that.
But indeed querying for the number of clients sounds good and very doable I think.
My "toy" example with shared memory is here: https://github.com/jonakarl/mariadb_kernel/tree/WIP-shared_mem
As I said atm it does not work in all situations due to how python handles shared memory, e.g It does not work if we have two notebooks and then kill the first and try to start a third, then the shared memory does not exist (as the creator, the first notebook, has died) and the third notebook will crash when it tries to read the shared memory.
If we could query the server that would probably be the best. In such case we only need to keep track of the pid, which are available in the pidfile, and kill the server if we are the only client when do_shutdown is called. Of course if the users manually connect to the server outside the notebook we might end up with a lingering server (as the client count never goes below 1) but I guess that case could be considered out of scope.
We have looked a bit into this some more and I am sure this could be cleaned up a lot but we will do a new PR tomorrow.
Do you know any better way of getting the raw value of the sql query without mangling panda or html strings?
def num_connected_clients(self):
result_html = self.mariadb_client.run_statement(code='select count(id) from information_schema.processlist;', timeout=30)
return pandas.read_html(result_html)[0]['count(id)'][0]
def do_shutdown(self, restart):
num_clients = int(self.num_connected_clients())
.....
if num_clients < 2 then kill server
This looks good to me. The MariaDB client is started in HTML output mode, so another way would mean starting a new client without the --html option and I don't think it's worth doing it, code looks fine now. The focus of the project is not performance at this stage, when we'll move to that I will add an API for querying the server without going through HTML.
I've been thinking about this problem too and an issue with this approach might be: can the kernel accidentally kill a MariaDB server instance that it didn't start itself? Because we abandon the idea of tracking whether the kernel itself started the server and we plan to have the last kernel alive to just send a sigkill to mysqld based on the PID from the .pid file, such a question becomes reasonable.
In most setups I expect this to not be an issue: multiple notebooks/kernels started on a laptop or in a docker container, one server either started by hand by the user or automatically by one kernel, last kernel kills the server, not a problem. I'm thinking of setups where people are running a jupyter notebook server on the same machine/machines with a MariaDB deployment (single server, replication setup, etc), if somehow the notebook server is started by the same user that started the MariaDB deployment too, I guess it's theoretically possible that the kernel kills one server instance from a replication setup and mess up a lot of things.
From the top of my head this would only be a problem if they somehow managed to use the same pidfile.
Unlikely or not but I believe a random PID file would solve this.
Easy fix (if it covers all use cases), will code it up tomorrow and we can discuss.
Hämta Outlook för Androidhttps://aka.ms/AAb9ysg
From: Robert Bindar @.> Sent: Monday, May 10, 2021 4:10:17 PM To: MariaDB/mariadb_kernel @.> Cc: Jonas Karlsson @.>; Mention @.> Subject: Re: [MariaDB/mariadb_kernel] If multiple notebooks are opened, only the last one should attempt to kill the MariaDB server (#24)
In most setups I expect this to not be an issue: multiple notebooks/kernels started on a laptop or in a docker container, one server either started by hand by the user or automatically by one kernel, last kernel kills the server, not a problem. I'm thinking of setups where people are running a jupyter notebook server on the same machine/machines with a MariaDB deployment (single server, replication setup, etc), if somehow the notebook server is started by the same user that started the MariaDB deployment too, I guess it's theoretically possible that the kernel kills one server instance from a replication setup and mess up a lot of things.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/MariaDB/mariadb_kernel/issues/24#issuecomment-836748365, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACFVVF3AJK6NIUCF3KELTJDTM7SMTANCNFSM44FFPPUQ.
När du skickar e-post till Karlstads universitet behandlar vi dina personuppgifterhttps://www.kau.se/gdpr. When you send an e-mail to Karlstad University, we will process your personal datahttps://www.kau.se/en/gdpr.
I have though some more on this and the only scenario I can think of where this could happen is :
If neither of the above is true the client will not be able to kill the server. Ie if the client has only configured the same pid file location; the server will not start (pidfile is open) and the client can not connect => and will therefore not be able to find out how many clients are connected => and will therefore not shut down the server (as that is only done when the number of clients are < 2). If the pidfile are different the there is no issue as two processes cannot have the same pid and therefore the correct process will be killed (if the port and bind-address are the same as the other server our server will not start but that does not matter in this case) .
This exactly the reasoning I was failing to find, great work! 🎉 The pid file is the key indeed and I'm now certain your solution will work perfectly, can't wait to see it! 🍺
This exactly the reasoning I was failing to find, great work! tada The pid file is the key indeed and I'm now certain your solution will work perfectly, can't wait to see it! beer
Already available in #25 :)
As pointed by @jonakarl in #17, there is a corner case we didn't think of when multiple MariaDB notebooks are started by the same Jupyter user and one of them also was responsible for spinning a MariaDB server (
start_server=True
).Case: a student opens 2 notebooks. The first one starts the server, the second one connects to the already spinning server. But the student closes the first notebook, which kills the server. Thus the second notebook remains hanging with no server to talk to.
We most probably don't have a way to share state between kernel instances from multiple notebooks (should be researched), but an idea would be to store such state within the server itself.