Closed henrymercer closed 3 years ago
So do we want additional functionality from the cli to fix this? Does actually running a query on the newly selected database release the old db's lock and acquire the new one?
So do we want additional functionality from the cli to fix this?
I'm unfamiliar so am not sure whether this would require additional queryserver functionality.
Does actually running a query on the newly selected database release the old db's lock and acquire the new one?
Once a query has been run from the newly selected database, running a query on the originally selected database from the CodeQL CLI succeeds. The lockfile seems to be preserved, however.
OK. Reproduced your problem. If there is a cli or query server command to remove a process's lock file, that would be what we need. Though, I don't think there is one.
@hmakholm do you have any suggestions on what we can do here?
The lock ought to be dropped automatically when the MemoryBackend
that refers to the old database is closed. ~If the problem can be reproduced without provoking any particular evaluator error first, it must be that somehow the MemoryBackend
doesn't get closed when we change database in the query server -- which is bad for a whole range of reasons.~
On the other hand, it doesn't look like the code that creates MemoryBackend
s ought to be capable of not closing the old backend. ~The alternative would be that FileSystemLock#release()
somehow doesn't work.~
Which platform are you reproducing on?
I'm on mac.
Would it make sense for the extension to explicitly delete the lock file when databases change? I guess it would require some sleuthing because the lock file name looks partially generated.
And to be clear, when the extension changes databases, it doesn't actually do anything from the cli/queryserver perspective. It is only updating its internal state so that it can handle future queries.
And to be clear, when the extension changes databases, it doesn't actually do anything from the cli/queryserver perspective. It is only updating its internal state so that it can handle future queries.
Ah, that's it; I see what you mean. I'll cancel my panic over something potentially being wrong with the locking implementation. :grin:
Yes I think we should add a "select database" or "database changed" command in the query server protocol so it can close its MemoryBackend
eagerly when that happens.
@alexet, you have a better knowledge of the protocol -- is there a way to add a new command in a backward-compatible way, or would we need to have the extension vary its behavior based on the CLI version?
(Note that this would not entail deleting the lock file, but simply relinquishing its OS-level exclusive access to the file. This is safer than having the mere presence of the file mean "locked", because the OS will implicitly drop the lock when the owning process terminates for any reason, so it should be impossible to end up with a stale lock after a process is killed).
The problem with my initial implementation of the command was that due to the queue it was very hard to get a visual indication of whether a database was locked. The following sequence was problematic.
The protocol was designed to be externally stateless as most of the bugs with the older protocol were to do with disagreement about the state. It was mostly fine when everything was synchronous and serialised but the ability to run more than one query at once caused problems. However problems like these make me wonder if we should have had a more stateful design where the onus would be on the client to ensure that we don't change database while a query is running.
Note that due to the parallelism you can end up doing things like run two queries on two different databases and they end up running in the opposite order to what you ran (because we compile queries in parallel but we avoid opening multiple databases). This makes the system much harder to design.
I am tempted for vscode to run in a mode that always closes the database instantly. This would be a major performance hit but would avoid these problems.
For adding a new command you can kind-of just run the command and get an error back for a missing command but it is probably easier to depend on the cli version.
Note that due to the parallelism you can end up doing things like run two queries on two different databases and they end up running in the opposite order to what you ran (because we compile queries in parallel but we avoid opening multiple databases).
Hmm, won't that mean that we have multiple active MemoryBackends each with their own MemoryManager that thinks it can use all the configured space? That doesn't sound good.
No we avoid that, we just compile the queries in parallel before obtaining the backend lock. So whichever query compiles first claims the backend lock which may not be in the order which the user ran the queries.
However problems like these make me wonder if we should have had a more stateful design where the onus would be on the client to ensure that we don't change database while a query is running.
A point in defence of your stateless design: we did put this onus on the client in the Visual Studio extension, and I remember it was really hard to maintain the client-side database locking, so we moved the onus to the query server instead.
A possible design might be to add messages that let the extension inform the query server of what the currently selected database is. Then the query server would shut down a backend when it (a) is idle, and (b) doesn't match what we have been told is the selected database. Everything else stays as it is now, so if we lose synchronization between the two ends, the behavior degrades gracefully.
So, the extension just needs to message the query server any time the database changes?
And if a request comes in from the cli for a different database, then the lock file will be transparently removed if the query server is idle?
I think that makes sense from my perspective.
So, the extension just needs to message the query server any time the database changes?
Well yes, and the query server would need to implement its end of the design :smile:
Still blocking on lock-releasing or cache-clearing functionality in cli.
We should probably discuss at the team meeting tomorrow what the plan is -- I have the impression everybody is waiting for someone else to pick this up.
This has been a minor frustration for me in my recent work, so I'd be delighted if you could fix it.
One additional pain point that I didn't see mentioned: for Windows users, the open lock also prevents you from deleting the database directory.
This has also been a frustration for me recently as well, particularly with respect to tests:
1) Run a test which fails 2) Use the generated test database 3) Fix the query 4) Switch to a different database 5) Re-run the test
Which then fails.
OK. The solution in #681 is a partial fix. Now when a database is removed from the workspace, the lock is removed as well.
I think we need to add a third query server command (in addition to register and deregister databases), called unlock
database. It will be similar to deregister in that the database will be closed, but it will not actually remove the database from the allow-list.
After discussion with @hmakholm and @alexet we've decided that we will not explicitly try to solve this workflow. The workflow that we are solving is that we want to guarantee that if you remove a database from vscode, you can always run a query on it.
If you don't remove a db from vscode, the lock may be released under certain circumstances, but this is not a workflow we are encouraging or documenting.
Describe the bug Selecting a new DB doesn't release the cache directory lock on the originally selected DB, causing query runs on the originally selected DB from the CLI to fail.
To Reproduce
Expected behavior Selecting database B should unlock the cache directory of database A, so that the query from the CLI succeeds.
Additional context The cache directory of database A will be unlocked when a query is run from the extension on database B.