dyedgreen / deno-sqlite

Deno SQLite module
https://deno.land/x/sqlite
MIT License
409 stars 36 forks source link

sqlite in a webworker causes the runtime to hang #174

Closed irbull closed 2 years ago

irbull commented 2 years ago

I also opened this issue in the Deno project itself, as it may be related to the runtime, but I'm opening one here too in case others have seen this issue with the sqlite library. The original issue is posted here https://github.com/denoland/deno/issues/13591

I used 🦖 Deno 1.18.2

If I use the following code in a worker, it causes the worker to hang. If I then call terminate on the worker, the entire deno process hangs:

// worker.ts
import { DB } from "https://deno.land/x/sqlite@v3.2.0/mod.ts";

console.log('Open a database');
const db = new DB("test.db");
db.query(`
  CREATE TABLE IF NOT EXISTS people (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT
  )
`);
console.log('Close connection');
db.close();
console.log('completed');
// main.ts
const worker = new Worker(new URL("./worker.js", import.meta.url).href, { type: "module", deno: {namespace: true}});

setInterval(async()=> {
    console.log('heartbeat');
}, 1000);

setTimeout(async()=> {
    console.log('terminating the working');
    worker.terminate();
}, 10000);

To run this, first bundle the worker, then execute main.ts.

deno bundle worker.ts worker.js
deno run -A --unstable main.ts

This example starts a worker that tries to create a database using an SQLite library. It also prints a hearbeat every second. After 10 seconds, it terminates the worker. At this time, no more heartbeats are printed from the main program. You can also see that the worker process doesn't continue once it tries to create the DB.

dyedgreen commented 2 years ago

This seems to work locally for me (using deno v 1.17.3, and also using deno 1.18.2), with the following output:

Open a database
Close connection
completed
heartbeat
heartbeat
heartbeat
heartbeat
heartbeat
heartbeat
heartbeat
heartbeat
heartbeat
terminating the working

What operating system are you using?

irbull commented 2 years ago

Sorry, I missed that here. It fails consistently for me on Windows. After the worker is terminated on Windows, the Deno process hangs.

dyedgreen commented 2 years ago

Makes sense. I believe this might be a Deno issue, but I'm not sure since I don't have a windows machine to reproduce.

One thing you might try is making a debug build of deno-sqlite (go into the /build sub directory, run make debug, then use the resulting module from the repo). In debug mode, there will be a lot of information printed, which might help with debugging (i.e. when sqlite attempts to write to a file).

dbuschtoens commented 2 years ago

I dug a little deeper using the debug build as you have suggested and have some findings:

The issue is not with execution from within a webworker, that was just a coincidence. The true problem lies in the file locking using Deno.flockSync, which is only available when executing with the --unstable flag. It just so happened that we had previously only used that flag when executing the example with the web worker. Executing the worker.ts as described above in with deno run -A --unstable worker.ts will hang on windows.

I have adjusted the example code and replaced the file locking functions to both disable the actual implementation of file locking and display which file locking operations are attempted.

Deno.flockSync = (...args) => console.log(`flockSync with args ${args}`)
Deno.funlockSync = (...args) => console.log(`funlockSync with args ${args}`)

That results in the following log output:

Open a database
flockSync with args 3,false
flockSync with args 3,false
flockSync with args 3,false
flockSync with args 3,true
funlockSync with args 3
Close connection
completed

The same file is locked multiple time unexclusively and then once exclusively.

A simple test script calling Deno.flockSync(rid, false) followed by Deno.flockSync(rid, true) on the same file shows that this is possible on unix, but hangs when attempting to acquire the exclusive lock in windows.

It is not clear to me which component is at fault here. Should it be okay to acquire a shared and then an exclusive lock on the same file without releasing the shared lock first? Is it the deno-sqlite or the sqlite library code that is responsible for this sequence of file locking operations? If exclusive-locking a shared-locked file should work, is it the deno implementation of flockSync or the underlying rust function that is to blame?

For now, we will use the workaround of replacing the deno file locking functions with no-ops.

dyedgreen commented 2 years ago

Interesting! I believe this is a difference between operating systems. On POSIX, the assumption is that you can upgrade a previously acquired lock from read-write to read-only. SQLite assumes this is always true of the VFS layer, but this currently just relies on Deno behavior here.

It's unclear to me that Deno should expose those platform differences, but I think we might want to work around it in the library regardless. (I.e. I believe this is something that should be fixed in the currently unstable Deno API, but we also probably want to work around that behavior in deno-sqlite to make sure it works as expected now.)

dyedgreen commented 2 years ago

This is now patched with #192 on Windows.

dyedgreen commented 2 years ago

It would be nicer to eventually support inter-process file locks as well, so locks can be held between workers and the main thread, but since that did not work previously, this is strictly an improvement for Windows users.