deltachat / napi-jsonrpc

use jsonrpc over napi.rs in nodejs
3 stars 0 forks source link

`yarn test` is flaky and sometimes panics #2

Closed Simon-Laux closed 1 year ago

Simon-Laux commented 1 year ago

It is related to the event loop, I suspect it is not destroyed / shutdown correctly as the bot example works fine.

https://github.com/deltachat/napi-jsonrpc/blob/ba3750cba9403ff649a7c16b61f7d8226210ede0/src/lib.rs#L38-L50

when commenting it out the test gets more stable and I have not gotten the issue without it. The issue:

thread 'tokio-runtime-worker' panicked at 'index out of bounds: the len is 10 but the index is 12',  ~/.cargo/registry/src/github.com-1ecc6299db9ec823/concurrent-queue-1.2.4/src/bounded.rs:160:25

more in https://github.com/deltachat/napi-jsonrpc/blob/ba3750cba9403ff649a7c16b61f7d8226210ede0/panic%20-%20problem the len and index change, I have seen such small values, but also really large ones already.

I don't know how to shut down the event loop cleanly when destroying, garbage collecting the Account Manager, because:

https://github.com/deltachat-bot/echo/blob/088f4bf6dd5646e243cbe9825610f6176a7a14f9/rust/src/main.rs#L116-L124

To reproduce: run yarn test many times, sometimes it panics.

link2xt commented 1 year ago

We had a similar problem (with concurrent-queue) previously:

As far as I remember, I carefully checked all the code in concurrent-queue and decided there are no bugs there. The bug was fixed by not using objects after freeing them: https://github.com/deltachat/deltachat-core-rust/issues/3430 So I guess there is an use-after-free issue too.

Simon-Laux commented 1 year ago

windows ci just watched this issue, too:

``` 022-10-24T19:52:43.292Z napi:build Write binary content to [napi-jsonrpc.win32-ia32-msvc.node] [339](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:340) [340](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:341) ✔ low_level › create account manager [341](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:342) ✔ low_level › jsonrpc low_level smoketest 1 [342](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:343) 1 { [343](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:344) msg: 'C:\\Users\\runneradmin\\.cargo\\git\\checkouts\\deltachat-core-rust-632648ad67f90089\\b6b2f45\\src\\sql\\migrations.rs:623: Created new database; [migration] v68-v92', [344](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:345) type: 'Info' [345](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:346) } [346](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:347) 1 { [347](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:348) ✔ index › jsonrpc wrapper smoketest (194ms) [348](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:349) msg: 'C:\\Users\\runneradmin\\.cargo\\git\\checkouts\\deltachat-core-rust-632648ad67f90089\\b6b2f45\\src\\sql.rs:342: Opened database "C:\\\\Users\\\\RUNNER~1\\\\AppData\\\\Local\\\\Temp\\\\realdc-test0.9400025749693954-1666641172217\\\\3f0beca5-5717-4f5e-ba05-cc34ff47525c\\\\dc.db".', [349](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:350) type: 'Info' [350](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:351) } [351](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:352) ✔ low_level › jsonrpc low_level smoketest 2 (214ms) [352](https://github.com/deltachat/napi-jsonrpc/actions/runs/3315593787/jobs/5476378366#step:12:353) D:\a\_temp\c7dc93aa-6a71-4e34-94fb-6950681c77fc.sh: line 3: 279 Segmentation fault yarn test ```

I'm rerunning it in hope that it fixes itself for that ci run

link2xt commented 1 year ago

Since we don't use any unsafe code, I think it's a bug in napi and should be reported upstream. But first we need a minimal example, because if we just report to napi devs that this repo is crashing, doubt anyone will dig into deltachat-core-rust to figure out how event loop is implemented and where tokio runtime is initialized.

So the plan is roughly:

  1. Reduce this repo down to a crashing example without any test framework etc.
  2. Reduce deltachat-core-rust in a separate branch by stripping all the account initialization code, config writing, database initialization etc. until it stops failing.

Ideally this can be reduced to creating some objects resembling account manager, Arcs, a spawned task and some channels for events.