Anyolite / anyolite

Embedded mruby/Ruby for Crystal
https://anyolite.github.io/anyolite
MIT License
162 stars 10 forks source link

Invalid memory access when running multiple Ruby script calls at once #27

Open chendo opened 1 year ago

chendo commented 1 year ago

Under high load when running a simple program on my M1X Macbook Pro, I occasionally get an Invalid memory access (signal 11) crash.

Reproduction repo: https://github.com/chendo/crystal-anyolite-crash-repro

Tested with Crystal 1.7.2, anyolite main (efe3337).

Interestingly enough, I could not reproduce the same issue inside an amd64 docker container (Rosetta).

Full crash log:

Invalid memory access (signal 11) at address 0x0
[0x1042f4d80] *Exception::CallStack::print_backtrace:Nil +104 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x1042a93c8] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil)@/opt/homebrew/Cellar/crystal/1.7.2/share/crystal/src/signal.cr:127 +320 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x1a8f2c2a4] _sigtramp +56 in /usr/lib/system/libsystem_platform.dylib
[0x104481068] mrb_vm_exec +10496 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x10447e608] mrb_vm_run +148 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x1044d5ea4] mrb_load_exec +880 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x10446b3e4] execute_script_line +80 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x104413658] *Anyolite::RbInterpreter#execute_script_line<String>:struct.Anyolite::RbCore::RbValue +124 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x1042dfa10] ~procProc(HTTP::Server::Context, Nil)@src/bouncer.cr:20 +60 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x10442bba4] *HTTP::Server::RequestProcessor#process<IO+, IO+>:Nil +880 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x10442a700] *HTTP::Server#handle_client<IO+>:Nil +1756 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x1042e0364] ~procProc(Nil)@/opt/homebrew/Cellar/crystal/1.7.2/share/crystal/src/http/server.cr:468 +32 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x104354d3c] *Fiber#run:(IO::FileDescriptor | Nil) +84 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
[0x1042a8ec4] ~proc2Proc(Fiber, (IO::FileDescriptor | Nil))@/opt/homebrew/Cellar/crystal/1.7.2/share/crystal/src/fiber.cr:98 +12 in /Users/chendo/.cache/crystal/crystal-run-bouncer.tmp
Hadeweka commented 1 year ago

The program from the linked repository runs perfectly fine on WSL on my Surface Pro X (which also has ARM64), as well as on my AMD64 on both Windows (using hey) and WSL.

How often do you encounter this error?

Maybe there is some problem with mruby or Crystal on Mac, which triggers this, but I don't really have any options to test this. Based on the error message the crash seems to happen somewhere in the execution of the Ruby program, but the backtrace doesn't really go any further than that.

Perhaps running the Crystal program with valgrind could provide a bit more insight on where exactly the invalid memory access happens.

chendo commented 1 year ago

I was getting it every 2nd or 3rd run, although I did just have one where it didn't crash until the 5th run. It reads like to me that the crash itself is printing the backtrace, although I could be wrong there.

If I enable preview_mt with crystal run -D preview_mt repro.cr, then run wrk, I sometimes get the same crash almost immediately, although I also often see other types of crashes as Anyolite doesn't support multithreading currently.

Unfortunately it seems valgrind is Linux-only, at least according to the Homebrew cask. I only have arm64 on on my Mac at the moment. I'll try to come up with a more reliable repro.

chendo commented 1 year ago

Alright, I've improved the repro so it's 100% reliable for me now, even on amd64. I've added a sleep 0.1 within the Crystal code that is being called from mruby. It looks like the HTTP::Server spawns multiple threads even without -D preview_mt so I assume there's some shenanigans there. I do want to use anyolite for a high throughput/concurrency scenario though, so not really sure how to progress.

Hadeweka commented 1 year ago

Yep, now I can reproduce it as well.

The problem is apparently that two script calls interfere with each other, since they both use the same interpreter. In that case, putting a mutex lock around the script line call should fix the problem, but it might limit the number of requests per second a bit, depending on the performance of the Ruby scripts (at least I see no significant difference between both cases if I use an empty handler function).

Anyway, I could add some trivial guards to Anyolite to make it thread-safe for now. The limitation here is that mruby itself isn't thread-safe (and will probably never be).

Another solution will of course be support for multiple interpreters in parallel, but I will open a separate discussion thread for that (https://github.com/Anyolite/anyolite/discussions/28).