dyedgreen / deno-sqlite

Deno SQLite module
https://deno.land/x/sqlite
MIT License
409 stars 36 forks source link

Uncaught (in promise) NotFound: No such file or directory (os error 2) #143

Closed s-i-e-v-e closed 3 years ago

s-i-e-v-e commented 3 years ago

I get this error when committing transactions in a callback function called by various web workers. As all database writes/updates only ever happen within this one function which (I believe) runs in the context of the main thread, I am not sure this involves SQLite multi-threading issues caused by Deno's lack of basic OS and File System API integration (I see that you have raised an issue regarding this).

I am running Deno 1.12.2 on Manjaro Linux.

error: Uncaught (in promise) NotFound: No such file or directory (os error 2)
      Deno.removeSync(path);
           ^
    at deno:core/01_core.js:106:46
    at unwrapOpResult (deno:core/01_core.js:126:13)
    at Object.opSync (deno:core/01_core.js:140:12)
    at Object.removeSync (deno:runtime/js/30_fs.js:148:10)
    at js_delete (https://deno.land/x/sqlite@v3.0.0/build/vfs.js:37:12)
    at denoDelete (wasm://wasm/0028a31a:1:2946)
    at pager_end_transaction (wasm://wasm/0028a31a:1:36760)
    at sqlite3BtreeCommitPhaseTwo (wasm://wasm/0028a31a:1:28045)
    at sqlite3VdbeHalt (wasm://wasm/0028a31a:1:43968)
    at sqlite3VdbeExec (wasm://wasm/0028a31a:1:63810)
dyedgreen commented 3 years ago

From the back-trace it looks like it's trying to delete the journal file. But I'm not sure what would cause the journal file to already be missing.

Do you have a minimal working example to reproduce this issue, based on your code?

s-i-e-v-e commented 3 years ago

I will try to write something that reproduces the problem.

Meanwhile, I played around with the thread count. Reducing it produced an error with the web workers. They only ever run SELECT queries on the database and each worker uses a new connection.

NotFound: No such file or directory (os error 2)
    at deno:core/01_core.js:106:46
    at unwrapOpResult (deno:core/01_core.js:126:13)
    at Object.opSync (deno:core/01_core.js:140:12)
    at Object.removeSync (deno:runtime/js/30_fs.js:148:10)
    at js_delete (https://deno.land/x/sqlite@v3.0.0/build/vfs.js:37:12)
    at denoDelete (wasm://wasm/0028a31a:1:2946)
    at pager_end_transaction (wasm://wasm/0028a31a:1:36760)
    at pager_playback (wasm://wasm/0028a31a:1:157718)
    at sqlite3PagerSharedLock (wasm://wasm/0028a31a:1:29790)
    at sqlite3BtreeBeginTrans (wasm://wasm/0028a31a:1:24051)
s-i-e-v-e commented 3 years ago

This code produces one of the above errors somewhat reliably. If it does not, you can try tweaking the MAX_THREADS and SOME_BIG_NUMBER variables in the Journal.ts file and even try deleting the generated journal-test.db file.

You can run the code with deno run --unstable -A ./Journal.ts

dyedgreen commented 3 years ago

I'll have a look. But from what I can tell now, it's probably an issue with multiple processes trying to clean up the journal files at the same time (which would be fixed by having a file locking API) ...

s-i-e-v-e commented 3 years ago

For some reason, the library is trying to write to the database even for SELECT queries. I verified this by opening the database in read only mode.

error: Uncaught (in worker "") (in promise) SqliteError: attempt to write a readonly database
      throw new SqliteError(this._wasm, this._status);
            ^
    at PreparedQuery.all (https://deno.land/x/sqlite@v3.0.0/src/query.ts:422:13)
    at DB.query (https://deno.land/x/sqlite@v3.0.0/src/db.ts:111:26)
    at db_get_max (file:///home/sieve/code/docs/test/journal/Data.ts:18:22)
    at self.onmessage (file:///home/sieve/code/docs/test/journal/Worker.ts:10:25)
    at wrappedHandler (deno:runtime/js/01_web_util.js:54:14)
    at innerInvokeEventListeners (deno:extensions/web/02_event.js:751:9)
    at invokeEventListeners (deno:extensions/web/02_event.js:790:5)
    at dispatch (deno:extensions/web/02_event.js:660:11)
    at dispatchEvent (deno:extensions/web/02_event.js:997:14)
    at pollForMessages (deno:runtime/js/99_main.js:142:9)
dyedgreen commented 3 years ago

I've tried running your script a few times and with different thread counts, but haven't been able to reproduce the error. This would suggest to me that it's indeed a problem with synchronized access.

My guess would be that having a file-locking API would fix these issues, but I'm not sure since I can't reproduce your errors. How reliably does the example you have fail on your machine (and out of curiosity, what OS/ version of Deno are you using)?

s-i-e-v-e commented 3 years ago

The updated code (enabled readonly mode flag) fails about 2/10 times with the "attempt to write a readonly database" error. This is without changing any of the variables.

Env:

s-i-e-v-e commented 3 years ago

After this commit, it fails almost every single time. The only change was that I introduced an iif function into the SELECT statement.

dyedgreen commented 3 years ago

I’ll have another look. But from the kind of change it makes me think it’s very likely a concurrency issue 😅

s-i-e-v-e commented 3 years ago

I think you can shelve this issue till Deno (hopefully) improves their FS API.

I dropped the db in favor of an in-memory data structure in my program. So this issue no longer affects me at present.

dyedgreen commented 3 years ago

@s-i-e-v-e could you try #152 (with --unstable flag to make use of the unstable Deno API), to see if that fixes the problem? (There is a change it won't work even with file locking, since POSIX file locks are per process and I'm not sure how that interacts with Workers and if there is anything we can do there; since we don't have a global shared memory to coordinate with the worker instances transparently 😅)

s-i-e-v-e commented 3 years ago

see if that fixes the problem?

Compared this version to the current one across multiple runs. This one doesn't throw errors. The current one does. I think the fix fixed the problem.

I would like to see if it works when multiple workers write to the same db. Will try it out after modifying the test case and let you know by tomorrow.

s-i-e-v-e commented 3 years ago

if it works when multiple workers write to the same db

It doesn't. It fails with the following error when transactions are involved. And I don't use databases without transactions.

error: Uncaught (in worker "") (in promise) NotFound: No such file or directory (os error 2)
      Deno.removeSync(path);
           ^
    at deno:core/01_core.js:106:46
    at unwrapOpResult (deno:core/01_core.js:126:13)
    at Object.opSync (deno:core/01_core.js:140:12)
    at Object.removeSync (deno:runtime/js/30_fs.js:148:10)
    at js_delete (file:///home/sieve/code/docs/test/journal/sqlite/build/vfs.js:37:12)
    at denoDelete (wasm://wasm/0028f022:1:2991)
    at pager_end_transaction (wasm://wasm/0028f022:1:36936)
    at sqlite3BtreeCommitPhaseTwo (wasm://wasm/0028f022:1:28204)
    at sqlite3VdbeHalt (wasm://wasm/0028f022:1:44094)
    at sqlite3VdbeExec (wasm://wasm/0028f022:1:64237)
error: Uncaught (in promise) Error: Unhandled error event reached main worker.
    at Worker.#pollControl (deno:runtime/js/11_workers.js:268:23)

I guess it is trying to delete the journal which no longer exists at that point in time.

dyedgreen commented 3 years ago

Yeah, that’s what I would have expected. The problem is that the file locks are per process, but the two workers run within the same os process. (And we’d have to share memory explicitly through postMessage to use something like SharedArrayBuffer to coordinate locks on the js side I believe) 😓

s-i-e-v-e commented 3 years ago

From https://sqlite.org/howtocorrupt.html#_posix_advisory_locks_canceled_by_a_separate_thread_doing_close_

it is perfectly safe for two or more threads to access the same SQLite database file using the SQLite library. The unix drivers for SQLite know about the POSIX advisory locking quirks and work around them. This problem only arises when a thread tries to bypass the SQLite library and read the database file directly.

Is it that the syscall mappings don't let the wasm version mimic this behavior?

dyedgreen commented 3 years ago

Yes, that's exactly the problem. The issue is this:

  1. Deno-sqlite uses a separate WASM context for each database (this allows the JS GC to collect databases when they are leaked) -> separate instances can't communicate on the C-side (if you want to move things to separate workers, you'd need separate WASM contexts anyways I believe)
  2. If you run new DB() in a worker, it's OS bindings can't communicate with potential other workers, since they are in different isolates (we can only share memory by explicitly passing SharedArrayBuffer with post message, so this can't be done transparently)
  3. So this means, there is no way to coordinate the file locking between different database connections to the same file in the same deno program (they do coordinate with SQLite instances opened by other processes though, which they did not before this change)
s-i-e-v-e commented 3 years ago

The only solution then is FFI, assuming Deno implements it at some point. But that is not portable.

dyedgreen commented 3 years ago

There is an FFI interface for deno, but the drawback of that is it requires -A permission. One possible other solution would be to keep track of the in-process locks on the Deno side 🤔

s-i-e-v-e commented 3 years ago

There is an FFI interface for deno

Great. Last I checked, there were plans to implement it.

One possible other solution would be to keep track of the in-process locks on the Deno side

I am not sure if it is worth the effort.

The main problem I see is that Deno is trying to serve two use cases that don't necessarily intersect: in-browser apps and real apps.

The way I would solve something like this in other languages (assuming similar limitations) is isolate all database writes to a single, permanent worker/thread. This would be the only way to handle something like a web server receiving multiple requests that may modify the database. But then, instead of letting the database driver determine how to handle multiple writes, we serialize access and that thread then becomes the bottle neck.

ralyodio commented 2 years ago

I also am seeing this problem on Ubuntu 20.04

SHould at the very least check if the file exists first.